PSYCHOLOGICAL 

STATISTICS 

I ■■■ ■ — — — Second Edition ■ ■■■ 

QUINN McNEMAR 


PROFESSOR OP PSYCHOLOGY, 
STATISTICS, AND EDUCATION 
STANFORD UNIVERSITY 



CoPYEiGHT, 1949, 1955, 
BY 

John Wiley & Sons, Inc 


All Rights Reserved 

This hook or any part theieof must not 
be reproduced m any form without 
the vmtten permiseyon of the publisher. 


hECOND EDITION 
Second printing^ December, 1957 


Library of Congress Catalog Card Number: 55-5320 


PRINTED IN THE UNITED STATES OF AMERICA 



Preface 


The widespread adoption of the first edition of this textbook 
suggests that it has been found useful in introductory courses al- 
though it was not written primarily for that level. In this revised 
and enlarged edition the elementary treatment of statistical in- 
ference has been expanded so as to make the book more palatable 
to the beginning student An attempt has been made in Chapter 5 
to mtroduce the student to the general problems associated with 
hypothesis testing These concepts are further developed and ex- 
tended to continuous variables for the large-sample situation in 
Chapter 6 and for small samples m Chapter 7 It is my firm con- 
viction that it is better pedagogically to provide a separate treat- 
ment of small-sample techniques, and in this order. 

In the development of the logic of statistical inference in Chap- 
ter 5, more use has been made of the binomial distribution, and 
some of the Neyman-Pearson principles of hypothesis testing have 
been introduced, with the notion of point estimation and confi- 
dence intervals postponed to Chapter 6 

The five chapters devoted to correlational analysis contain 
numerous minor revisions and extensions The previous intro- 
duction to chi square has been replaced by a binomial approach, 
and a method for handling several correlated proportions and the 
exact probability method for fourfold tables have been added. A 
short chapter presents methods for comparing both correlated 
and independent variabilities, includmg Bartlett's test for homo- 
geneity of vanance and the F distribution 

The first chapter on the analysis of vanance is essentially un- 
changed, whereas the second (Chapter 16) has been drastically 
revised in the direction of presenting the underlying models and 
their implication for the proper error term, or denominator for F. 
Assumptions are made more explicit. 

Some will be critical of Chapter 18 because it does not contain 
all the several so-called nonparametnc techniques My choice 
from among them was based m part on computational facility 



Preface 


and lack of dependence on special tables. Furthermore, I have 
been unable to find clear-cut information concerning the relative 
efficiency of the many proposed techniques. Until such time as 
the mathematical statistician succeeds in evaluating their rela- 
tive merits it seems unvuse to confront the student with an aggre- 
gation of distribution-free methods 

As in the first edition, I have aimed for conciseness, with stress 
on assumptions and interpretations instead of on routine compu- 
tational procedures. Some derivations have been included for the 
dual purpose of clarifying concepts and stimulating the mathe- 
matically inclined. 

It is impossible to disentangle and acknowledge aU the factors 
that have contributed to the content and writing of the first edi- 
tion and this revision Some will perhaps recognize the influence 
of two of my teachers. Professors Truman L. Kelley and Harold 
Hotelling. My greatest personal indebtedness is to Olga W. Mc- 
Nemar, who has done much to clarify the exposition and rid the 
volume of errors. 

I am indebted to Professor Ronald A Fisher and Dr Frank 
Yates, also to Messrs. Oliver and Boyd Limited, Edinburgh, for 
permission to reprint Tables III, IV, V, and VII from their book 
^‘Statistical Tables for Biological, Agricultural and Medical 
Research.” 

Qtjinn McNemar 

Palo AUo 

Jvly, mi 



Contents 


1 • 

Introduction 

1 

2 • 

Tabular and graphic methods 

5 

3 • 

Describmg frequency distributions 

13 

4 • 

Distribution curves 

32 

6 • 

Probability and hypothesis testing 

41 

6 • 

Inference: Continuous variables 

72 

7 ■ 

Small sample or t technique 

104 

T- 

Correlation: Introduction and computation 

116 

9 • 

Correlation: Interpretations and assumptions 

122 

10 • 

Factors which affect the correlation coefficient 

144 

11 • 

Multiple correlation 

169 

12 • 

Other correlation methods 

191 

13 • 

Frequency comparison: Chi square 

212 

14 • 

Comparison of variabilities 

243 

16 • 

Analysis of variance: Simple 

249 

16 • 

Analysis of variance: Complex 

281 

17 • 

Analysis of variance: Covariance method 

343 

18 • 

Distribution-free methods 

357 

19 • 

Bemarks on error reduction 

361 

Exercises 

367 

Appendix (Tables) 

381 

Index 

401 




CHAPTER 1 


Introductioii 


Statistical methods are concerned with the reducing of either 
large or small masses of data to a few convenient descriptive terms 
and with the drawing of inferences therefrom. The data are col- 
lected by any of several methods of research with the aid of meas- 
urmg devices appropriate to a given area of investigation. The 
research methods are variously named and classified. Thus in 
psychology we have methods which are labeled experimental, 
clinical, observational, etc. The devices for measuring or securing 
responses vary from those which involve delicate apparatus through 
paper-and-pencil schemes to controlled observations and inter- 
views. Statistical techniques are not to be considered as coordi- 
nate either with research methods or with devices for obtaining 
and recording responses, but rather as tools for analyzing data 
collected by whatever means. 

The reduction of a batch of data to a few descriptive measures 
is the part of statistical analysis which should lead one to a better 
over-all comprehension of the data. All readers will be more or 
less familiar with the concept of average. An average is a measure 
which describes what is typical of a group with respect to some 
trait, charactenstic, or variable. If one is comparing two or more 
groups, the determination of an average for each group permits a 
better appraisal of possible group differences than would be ob- 
tained by casual exammation of the data. There are various 
statistical measures, or types of averages, which have proven 
useful as descriptive terms for a variety of data One aim of this 
book is to present and discuss the descnptive statistical measures 
most frequently needed in psychological research. Proper usage 
and interpretation of these terms and evaluation of their use by 
others are not possible without knowledge of their meaning and 

1 



2 


Introduction 


their liuutii^ assumptions Incidentally, the user of statistical 
measures must give some thought to computational procedures 

As we proceed it wiU be necessary not only to define descnptive 
measures but also to distinguish between the usage of a given 
measure as being descriptive of a sample as opposed to a population. 
Since sample descnptive statistics are knowns (i.e , computable) 
whereas the corresponding population values are unknowns (but 
estimable), we will in this book define and discuss the descriptive 
measures m terms of samples and subsequently consider the prob- 
lem of drawing inferences about, or estimating, population values. 
Sample values are frequently refen ed to as statistics and population 
values are called parameters 

That part of statistical anal3^is which has to do with the draw- 
ing of inferences is imposed upon us because of certain inadequacies 
of research data. For instance, an investigator who wishes to 
know the average height of adult women in the United States will 
never have facihties for measuring every woman. Accordingly, 
he is compelled to measure a sample of women; then on the basis 
of information yielded by the sample he can make an inference 
concerning the average height of the population of women. An- 
other mvestigator, wishing to evaluate the relative merits of 
2 leanung methods, tries out the methods with 2 small groups 
of students, and from the results an inference is made concerning 
what might be expected if he had facilities for working with very 
large groups. An opinion poller may seek information about the 
reactions of Republicans and Democrats to some world event. 
By questioning a sample of each group he can secure sufficient 
data for drawing an inference r^arding a possible difference be- 
tween the population of Republicans and the population of Demo- 
crats. 

The problem of statistical inference is usually that of deter- 
mining whether statistical significance can be attached to results 
after due allowance is made for known sources of error. There 
are many and varied situations for which we need tests of sagnifi- 
cance, and accordii^ly sev^al tests are available. Intelligent 
and critical inferences cannot be made by those who do not undeiv 
stand the purposes, assumptions, and applicability of the various 
techniques for judgmg significance. 

It is in connection with the problem of drawing inferences that 
a knowledge of statistical me^ods is most helpful. A research 



Introduction 


3 


should be planned in such a way that the resulting data are^amen- 
able to treatment by the available statistical techniques. With 
sufficient information concerning these techniques of analysis, one 
should be able to lay out m advance of data collecting the main 
types of statistical analysis to be used. If a proposed experi- 
mental setup precludes the possibility of adequate analysis, it 
may be found that a slight alteration in the plan will remedy the 
situation All too frequently the statistician is called in to help 
with data which have not been collected in such a manner as to 
permit efficient analysis Only by knowing the available methods 
of analysis can one plan a research with assurance that the results 
can be handled statistically 

Another reason for keeping in mind statistical considerations 
while planning a research is the fact that some experimental de- 
signs are preferable because they permit, with small additional 
cost, or even at a saving, better control of error than other plans. 
Indeed, certam designs lead to a marked reduction m known 
sources of error. 

A third reason for plannmg with foresight regarding the statis- 
tical analysis is that a set of data can sometimes be made to serve 
for checking several different h3rpotheses 

The student should be warned that he cannot expect miracles 
to be wrought by the use of statistical tools Although statistical 
methods have an important place in piesent-day psychological 
research, it does not follow that they can be utilized to salvage 
data that result from a haphazardly planned and sloppily executed 
investigation. No amount of statistical jugghng can transfigure 
bad data into acceptable form. It is doubtful whether the student 
who comes to the statistician with a batch of data and the question, 
“Can I compute a correlation coefficient . will make a 

scientific contiibution, but such a student deserves sympathy, 
especially if his major advisor has suggested that he need not 
worry about statistics until he has collected data 

The purpose of the present book is to acquaint the student with 
the statistical techniques commonly used, to suggest economical 
computational procedures, and to state the assumptions and limi- 
tations of the various techmques Whenever the understanding 
of a particular technique can be clanfied by a simple derivation, 
such a derivation will be given Unfortunately, many of the 
derivations are too complicated mathematically to permit con- 



4 


Introduction 


sideration in an elementary or intermediate treatment The 
quali^ed and interested student will find some of these derivations 
in more advanced textbooks and others in original souices. 

Statistical methods belong in the realm of applied mathematics, 
and consequently extensive scholarship m mathematics is required 
of those who choose to specialize in statistics One can, however, 
secure a practical working knowledge of statistical techniques 
without first becoming a mathematician, provided his deficiency 
in mathematics is not accompanied by an emotional reaction to 
symbols 

Within the realm of psychological lesearch there is wide varia- 
tion in the need for statistical procedures One can find current 
research reports which involve no use of statistics, some which 
involve very simple statistical treatment, still others which lean 
heavily on the tools of statistics, and a few which are highly statis- 
tical One need not shift from one area of investigation to another 
to find this vanation, but it is true that certain areas of research 
in psychology have less dependency than others on statistical pro- 
cedures The area of psychology which seems most dependent 
upon statistics is psychological measurement This dependency 
is due mainly to the very nature of psychological measurement, 
the theory of which is largely statistical 

The presence or absence of statistical analysis per se is not a 
safe criterion for judging the worth of a study — some studies 
would have been improved by the utilization of statistics, whereas 
others would be better if they had been so designed as to depend 
less upon statistical analysis Except for the requirement that 
the statistical analysis be adequate, there are no general rules as 
to how statistical a research should be. Of 2 experimental plans, 
either of which would provide appropriate data for checking a 
given hypothesis or sets of hypotheses, that plan which calls for 
simple statistical analysis is certainly preferable to the one which 
requires elaborate analysis Experimental control of errors is far 
better than statistical adjustments. 



CHAPTER 


Tabular and Graphic Methods 


When we are faced with a mass of data, the first manipulative 
step is tabulation or classification. If we are dealing with the 
number of children per family, the tabulation is equivalent to 
counting the number of one-child famihes, two-child famihes, etc. ; 
or if we have information on 1000 persons regarding their national 
origin, we can tabulate, or count, the number of those of German, 
French, Italian, etc, origin; or these same mdividuals can be 
classified as to eye color. If we have their heights, we can also 
classify (or tabulate) them as being 58, 69, 60, etc , inches in height, 
and if the shortest person is 68 and the tallest is 78 inches, we 
would tabulate our 1000 mto 21 different inch groups If we also 
know the weights of these individuals, we can classify again, this 
time as 100, 101, up to (say) 229 pounds, and thereby have 130 
groups. In all these situations we can classify with respect to the 
given characteristics, but the resultmg tabulations will show marked 
differences as we pass from trait to trait For instance, we may 
have only six national groups, and it will make little difference 
whether Germans or Russians are first on the tabulation sheet. 
Such a characteristic as nationality or eye color is said to be wnor- 
dered (and somewhat discrete). The number of children per family 
is discrete but can be ordered, from least to greatest number. 
Now such a trait as height can also be ordered, but it is said to be 
cMinuous (nondiscrete) because it is possible to have an infimte 
number of m-between values very closely spaced. Such a series 
is sometimes called graduaied. It will of course be obvious that a 
discrete series does not permit of in-between values, e.g , no family 
can have 2^ children. 

For most purposes it is adequate if we tabulate, or classify, 
individuals into certain large groups For example, instead of 
classifying our 1000 persons into pound groups (130 such groups) 

5 



6 


Tabular and Graphic Methods 

it is iJSuaUy sufficient to classify them into broader groups, say 
100-109, 110-119, etc., thereby obtaining 13 large groups. As a 
matter of fact, the use of fewer groups has a distinct advantage 
in that the labor of tabulating and computmg descnptive terms 
IS greatly lessened. The factors influencing the choice of the group- 
ing interval are two: fiist, its size should be such as to permit at 
least 10 or 12, but not more than 20, classes or groups; and second, 
it should promote tabulating convenience Suggestions for choos- 
ing tabulating intervals are: (1) determine the range of measures 
or scores, i e , the difference between the lowest and highest; (2) by 
inspection determine whether the range can be divided into 12 to 
20 equal intervals of some convenient size, say 5 or 10; and (3) let 
the lower number of each interval be a multiple of the size of the 
interval. It is customary to arrange the tabulation sheet with 
the highest or largest values of the variable at the top and to use 
either dots or tally marks when tabulating. The tallies per inter- 
val can be counted and recorded to the right of the tally marks. 
This column is usually labelled /, and the sum of the fs will be N, 
or the total number of individuals in all the grouping intervals. 
Tabulation results in a frequency table or frequency distribution, 
such as that shown m the first two columns of Table 1. 


Table 1. Fbequenct Distrxbxttion op IQ^s for 161 Five-tear-old Bots 


Interval 

/ 

160-169 

150-159 

1 

140-149 

3 

130-139 

9 

120-129 

29 

110-119 

39 

100-109 

36 

90- 99 

32 

80- 89 

8 

70- 79 

2 

60- 69 

1 

60- 59 

1 

40- 49 

1 


Smoothed / Cumulative f 


.3 

161 

1 3 

160 

4 0 

160 

13.7 

167 

25.7 

148 

34.3 

119 

35.3 

80 

25.0 

45 

14 0 

13 

3.7 

5 

1.3 

3 

1.0 

2 

.7 

1 


It diould be noted that the expressed interval limit s in a fre- 
quency table are not necessarily the actual limits- Thus, if weight 
has been taken to the nearest pound, the actual limits of the inter- 



T 


Graphic Presentation 

val 130-139 would be 129 6 and 139.5; but if the ages of indi'dduals 
have been taken as at the last birthday, the interval 20-24 would 
have actual limits of 20 and 24 999+. Obviously for purposes of 
tabulation we need not use the implied actual limits, and for 
computational purposes we usually need either the lower limit or 
the midpoint of certain intervals, so there is nothing to be gamed 
by meticulously labeling the mtervals with actual limits. 



F%g 1 Histogram for data of Table 1 


GRAPHIC PRESENTATION 

If one scrutiiiizes the tally marks or the frequency table, he 
can obtain some notion as to how the individual values are dis- 
tributed A number of pictorial schemes have been suggested 
as aids in the study of frequency distributions It is posable to 
lay off the various values (or mtervals) of the variable on the 
horizontal or x axis, and to let the vertical or y axis represent the 
frequency per value or interval. The frequencies of the several 
intervals can be represented by drawing a horizontal line across 
each interval at the hei^t corresponding to the number of cases 
in that interval, and then connecting these horizontals with verti- 
cals erected at the interval limits. This yields a histogram (Fig 1). 
Using the aannft arrangement of the vertical and horizontal scales, 
one can merely indicate the frequency with a dot or cross placed 
directly above the midpoint of the interval, and then connect the 
adjacent pomts with straight lines. This results in a frequency 



8 


Tabular and Graphic Methods 

polygon (Fig. 2). Such a polygon or the corresponding histogram 
will usually show irregularities, on the assumption that these are 
due to the operation of chance, one can draw a smooth curve, 
cutting as near the points as possible, and this curv^e can be thought 
of as giving a better picture than the original polygon. A cun’^e 
which is obtamed by freehand drawing or by graphic smoothing 
schemes or by repeated smoothing of the frequencies by a method 
of moving averages is known as si, frequency curve. One method of 


40i 

I 

35 

30 

c 



15 


10 

5 

0 



40 50 60 70 80 90 100 110 120 130 140 150 160 
Fxg Frequency polygon for data of Table 1 


moving averages is illustrated in Table 1, in which an average is 
taken over 3 intervals. The smoothed value for an interval is 
obtained by summmg the frequencies in that interval and the 
2 adjacent intervals and dividing by 3. Thus the smoothed value 
for the interval 80-89 is equal to the sum of the frequencies 2, 8, 
and 32, divided by 3 For the 90 interval, 8, 32, and 35 are summed 
and divided by 3 The student should plot both the original and 
smoothed frequencies so as to compare the 2 graphs 
Although it is relatively easy to depict a frequency distribution by 
a histogram or by a frequency polygon or by a smoothed frequency 
curve, it is necessary that we note a shift in interpretation as we 
pass from the histogram to the polygon to the curve. In drawing 
the histogram we are in effect drawing a series of vertical bars with 
a common boundary for any 2 that are adjacent to each other. 
Since the height of each bar represents a frequency, we may, by 
arbitranly assigmng unity as the width of each bar, say that the 



9 


Graphic Presentation 

area of a bar also represents a frequency. Then the sun^of the 
areas of the several bars will be the total number of cases, or N, 
If we think of the polygon in Fig 2 as being superimposed on the 
histogram of Fig 1 and imagme that the common boundaries of 
the vertical bars have been erased, we will have a picture like that 
in Fig 3, in which the remaining parts of the bars have the appear- 
ance of an up and then down irregular staircase A little thought 



Fig 8 Frequency polygon supenmposed on histogram 


should convince the reader that the total area under this staircase 
is N, or precisely the same as the sum of the areas of all the bars 
Next consider the polygon. Note that as we pass from interval 
to interval the polygon m conjunction with the staircase histogram 
forms a series of pairs of equal-area tnangles. One of each pair is 
an area included under the polygon but not under the histogram, 
while the other is an area included xmder the histogram but not 
under the polygon The net effect of this balancing of areas, in 
and out, is that the total areas under the polygon and histogram 
are equal, each total area represents JV 
Now it should not stretch one^s imagination too much to regard 
the total area under a smoothed polygon or under a frequency 
curve as being equal to N, With this notion that area, not 
height, represents frequency we can readily speak of the area under 
the ciurve between ordinates erected at any 2 score values on the 
base line (x axis) as the number of cases between the 2 score points 



10 


Tabular and Graphic Methods 

And of course the area under any part of the curve could be ex- 
pressed as a proportion or a percentage of the total area. 

This concept of area as frequency will have considerable value 
for us as a basis for interpreting certain statistical measures, and 
the concept will be indispensable to our understanding of certain 
^^ideal/’ or mathematical, frequency curves, as yet undefined. 



40 50 60 70 80 90 100 110 120 130 140 150 160 

Fig 4 Ogive for data of Table 1 

Another type of graph can be obtained by the use oi cumulative 
frequencies In Table 1 will be found a column headed “Cumu- 
lative These values are obtained by successive adding of the 
frequencies, beginning with the lowest interval. Adding 1 and 1 
gives 2, adding to this the next frequency gives 3, to which in 
turn is added the next, giving 5, and so on until we have 160 plus 1 
for the last cumulative value, which is the total number of cases. 
Obviously, from the cumulative table one can tell how many 
individuals fall below a given point If one plots the cumulative 
values and connects the plotted points, an ogim curve results 
(Fig 4). Note that, in plotting the cumulative frequencies, one 
does not use the midpoint of the mterval, but rather the upper 
boundary Why? 

The use of frequency polygons m the comparison of 2 groups 
is quite simple and often very enlightening. M that is necessary 
is to plot the data for both groups on the same sheet and with 
reference to the same axes. If the number of cases in the 2 groups 



11 


Graphic Presentation 

differs markedly, a better comparison can be obtained by con- 
verting the frequencies for each group to percentages of the total 
number in each group. Polygons based on percentage frequencies 
will not portray differences which are merely a leflection of differing 
N’s and therefore are more comparable. A glance at 2 such fre- 
quency polygons will reveal whether the 2 groups show marked 
differences in the trait in question or to what extent the 2 distri- 
butions overlap. More refined methods for comparing groups will 
be discussed later. 

When one wishes to picture a discrete series, it is customary to 
use either horizontal or vertical bars, separated from each otiier, 
to represent the several frequencies As m the case of frequency 
polygons and histograms, there are no hard and fast rules regard- 
ing the heights (or lengths) of the bars relative to the horizontal 
(or vertical) base. The student should attempt to avoid extreme 
lack of proportion. Often in newspapers and magazmes one finds 
that frequencies have been represented as areas or solids. A 
circular diagram, or pie chart, m which the sizes of the separate 
sectors represent the percentage falling into given groups or 
classes is sometimes used to picture relative frequencies There 
is some evidence, and a general consensus of opinion, that some 
type of Imear graph is less likely to be misinterpreted than one 
depending upon areas or solids 

Another type of graphical representation is used to picture the 
relationship between 2 vanables, e g., growth in stature and age, 
or price change with year. To make such a line graph, one can lay 
off time or age or trials, on the horizontal axis, choose a convenient 
scale on the y axis for the other variable, and then plot the ob- 
servational values. The line graph should be arranged so that 
the graph is read from left to right and from the bottom to the 
top, and the scales on the 2 axes should allow the inclusion of 
all observed values of the 2 variables and at the same time permit 
of a well-balanced or well-proportioned picture A line graph 
can be made misleading by the choice of the scales on the 2 axes. 
For instance, if one is plotting the practice curve for card sorting 
(number of cards sorted on y axis, trial number on x axis), it is 
possible to make a tremendous difference in the appearance of the 
graph simply by altering the scale on the y axis. Of 2 curves 
which represent the same relationship, one (Fig 6a) would give 
the impression that the leammg had progressed quite rapidly. 



12 


Tabular and Graphic IMethods 

whereas the other (Fig 5b) would lead one to think that progress 
was slow The student will do well to develop a healthy scepticism 
of all graphs which he encounters for the simple reason that either 
scale can be so selected as to lead to gross misinterpretation 
It should be noted that smoothing may be applied to line graphs 
as well as to frequency polygons Often, if a line graph is smoothed, 
the relationship between the two variables can be more adequately 



Fig Sa Learning cuive (same data) Fuj o6 Leaiinng cuive (same data 
as Fig 55) as Fig 5«) 


characterized Smoothing out the irregularities will help one to 
see whether the relationship is linear or logarithmic or parabolic 
or of some other common type. Frequently a verbal description 
of a curve will aid in understanding something of the functional 
relatedness of the 2 variables To state a relationship in more 
exact mathematical language involves the application of some 
form of curve fitting by which the constants of the equation can 
be determined. 

The student who is interested in a complete discussion and 
treatment of graphic methods is referred to books on the subject 
by Brinton and by Arkin and Colton.* 

* Brinton, W C , Giaphtc presentation. New York Brinton Associates, 1939, 
Arkin, Herbert, and Colton, R R , Graphs, how to tnake ond ih^rn Naw 
Y ork Harper, 1936 





CHAPTER 3 


Describing Frequency Distributions 


It has been implied in Chapter 2 that a variable, such as height, 
IQ, or reading ability, can be represented by X, where X takes on 
various values, i e , varies from individual to individual. Obvi- 
ously, X is not used here to represent an unknown but rather as 
a symbol for any of several known quantities When a frequency 
polygon is drawn and smoothed, it is often found to be a curve 
which has a peak or maximum near the center of the .3 l ’ s and drops 
off gradually toward the base hne or x axis on either side of the 
point of maxunum value. In other words, a typical frequency 
curve (or polygon) or a frequency distribution can be roughly 
characterized as one which shows 4 chief features: a clustering 
of individuals toward some central value, dispersion about this 
value, symmetry or lack of symmetry, and flatness or steepness. 
Many variables or traits yield distributions which are said to be 
approximately bell-shaped, but such a description is not adequate 
for scientific purposes One wishes to know about what particular 
value and with how much scatter the individual scores are dis- 
tiibuted, to what extent the distnbution is symmetrical, and to 
what degree it is peaked or flat That is, we need measures of 
central value or tendency, measures of scatter or dispersion or 
variability, and measures of skewness (lack of symmetry) and of 
kurtosis (peakedness or flatness) With such measures, one can 
describe the distribution mathematically, and in such a way that a 
statistically tramed contemporary, say in Melbourne, can picture 
to himself the frequency distribution 
Thus we are led to a consideration of the vanous measures of 
central value, dispersion, skewness, and kurtosis It is adequate 
and usually more economical of time to determine these measures 
from frequency distributions rather than from the original un- 

13 



14 Describing Frequency Distributions 

distributed scores Since the computation of the descriptive terms 
frequently involves a determination of the lower limit or midpoint 
of a class interval, the student should recall what has been said 
about actual and expressed class limits Obviously, if one needs 
the midpoint of an interval, it is necessaiw only to add one-half 
the size ot the inter\^al to the actual lower limit, which must be 
determined by a consideration of the nature of the scores or 
measures which constitute the vanable* Psychological measure- 
ments and test scoies are usually treated as though rounded to the 
nearest value. 


MEASURES OF CENTRAL VALUE 

The mode. A glance at a typical frequency distribution will 
indicate to us the most frequently occurring X value, or for grouped 
data the group of -Y values which has the greatest frequency. 
This maximal frequency roughly defines the mode For non- 
grouped data the mode is the X value having the greatest fre- 
quency, whereas for grouped data the mode is taken as the mid- 
point of the interval which has the greatest frequency. For a 
smoothed frequency curve, the mode is the X value at which the 
curve reaches its maximum height The mode is one indicator of 
central value, but as a descriptive statistic it has serious limita- 
tions. If one uses a different size interval, the mode may be de- 
cidely different. Furthermore, it occasionally happens that 2 
nonadjacent intervals have the same maximal frequency, thereby 
Srielding 2 modal values Such a distribution is said to be bimodal, 
but it should be noted that the bimodality may not be real but 
merely accidental, the resultant of the particular grouping interval 
which has been chosen. In dealing with certain discrete series, 
like size of family, the modal value is apt to be more typical than 
some other measure of central value and therefore should be used, 
even though as a measure it is subject to greater sampling fluctua- 
tions than either the mean or the median (The question of 
sampling caimot be discussed at this time; the student is asked 
to take on faith statements regarding the eflBciency of a given 
statistic.) 

The median. As a measure of central value, the median is de- 
fined m 2 ways: (1) if the individual scores are arranged in order 
with respect to some trait, the median is the value of the midmost 
individual if N is odd, or lies midway between the 2 middle in- 



The Median 


15 


dividuals when N is even; (2) when a distribution has been made, 
the median is defined as the pomt on the scale such that tTie fre- 
quency above or below the point is 50 per cent of the total fre- 
quency For grouped data, the median may be determined by 
the following steps: 

1. Find one-half of N- 

2. Count the frequencies in a cumulative manner from the bot- 
tom up to that interval, say the sth, the frequency of which if 
included would give more than, if not included less than, N/2 
cases. Obviously the median will fall somewhere in this interval 
unless exactly half the values fall below the lower limit of an 
interval, in which case this lower limit is the median. Let Fc 
equal the total frequency up to the sth interval, and let Fs equal 
the frequency in the sth interval. 

3 {N/2 — Fc)/Ft will be the porportional distance required m 
the sth mterval to locate the median. 

4. Letting i equal the size of the interval and LLs the lower 
limit of the sth interval, the median will be given by 

... N/2-Fc 

Mdn = LLs H- i (1) 

Ft 

This involves the defensible assumption that the scores for the 
cases falling in the sth interval are distnbuted fairly evenly over 
the possible score values in the interval. 

The calculation of the median is illustrated in Table 2, in which 
is given the distribution of scores made by 50 college men on the 


Tabu $. The CALcniiA.TioN or the Median 


Score 

/ 


310-319 

1 


300-309 

2 


290-299 

4 

N/2 = 25 

280-289 

1 

sth mterval is 260-269 

270-279 

6 

Fc = 24 Fc^ 12 

260-269 

12 

i = 10 

250-259 

11 

LLs = 259.5 

240-249 

8 

- 25 - 24 ^ 

230-239 

2 

Mdn - 269.6 + 10 - - 260.33 

220-229 

0 


210-219 

3 



50 



16 Describing Frequency Distributions 

Brown spool packer. The score is the number of spools packed in 
four l-5minute trials 

The chief merits of the median are its ease of computation, its 
independence of extremes (it can be computed even if a known 
number of extremes have not been measured), and the fact that 
it is not affected by the size of extremes This last point will be 
clearer after a discussion of the mean. 

The mean. This arithmetic average will already be familiar to 
most readers. The mean is defined simply as the sum of all the 
scores or measures divided by their number or 



where X represents any score, the symbol S means “the sum of,*' 
and N is the total number of cases. When N is small, this defini- 
tion form can be used to compute the mean, but when N is laraje, 
say 50, 100, or more, such a method is not economical of time. 
Ordinarily, when N is large, one makes a frequency distribution 
from which it is possible to compute the mean and median and 
other statistical measures Assuming that the midpoint of an 
interval is t 3 T)ical of all the individuals in the interval, one can 
obtain the mean by summing the products of the several interval 
midpoints by their respective frequencies and dividing this sum 
by N. The error introduced by the use of midpoints is nonsys- 
tematic, i e , tends to be ironed out so far as the computed mean 
is concerned. 

The computation of the mean can be shortened further by use of 
an arbitrary origin and deviations therefrom. The reasonableness 
of such a procedure can be readily grasped by considering the 
problem of determining the mean height of a group of men. We 
could measure each man's height from the floor or as so much in 
excess of a stationary bar 5 feet from the floor The sum of the 
excesses divided by N will be the mean excess, and obviously 
we must add 5 feet to this to obtain the mean height of the group. 

When we have a frequency distribution the anthmetic can be 
shortened still further by expressing the deviation from an ar- 
bitrary origin in terms of step intervals, that is, as the number of 
intervals that a given mterval deviates from the arbitrary origin. 
The arbitrary origin is taken as the midpomt of any interval, and 



The Mean 


17 


it is assumed that the midpoint of each interval may be ta]^en as 
representing the scores m that interval. 

The procedure can be developed by simple algebra Let AO be 
the arbitrary origin, t be the mterval size, and d be the deviation 
in step intervals of the midpoint of any interval from AO Then 
each score can be expressed as X = AO + id in which AO and ^ are 
constant and d varies From the defimtion formula for the mean 
we have 

_ 2(A0 + id) 2(A0) + 2wi 

N N 


Now 2(A0) will equal N{AO) because summing a constant N 
times IS the same as multiplying it by N As an exercise, the 
student should demonstrate, by taking varying numbers each 
multiplied by a constant, that 2td = iSd, a constant can be 
brought out from under the summation sign. Hence we have 

N(AO) i2d 2d 

M AO + i — 

•x-r • IT * "IT 


Smce we started by summing N X^s and since each X is associ- 
ated with a d value, we should be summing N d's. That is, the 
d value for a particular interval needs to be summed / times (/ 
being the frequency for the interval), but the sum for a particular 
interval is simply / times its d If we replace 2d by 2/d we ex- 
plicitly indicate that each d is to be summed as often as it occurs. 
Accordingly, our computational formula for the mean is written as 


2/d 

M = AO + i — 
N 


( 3 ) 


In our algebraic derivation of formula (3) the only restriction 
placed on AO was that it be the midpoint of an interval; hence we 
are free to choose arbitrarily the midpoint of any interval as AO. 
In order to avoid negative d’s, AO is ordinarily taken as the mid- 
point of the lowest interval. Table 3 mdicates the computation of 
the mean from grouped data by use of an arbitrary origin and 
deviations therefrom in terms of step intervals. 



18 


Describing Frequency Distributions 
Tabu S Calculation op the Mean 


Score f d 
310-319 1 10 
300-309 2 9 
290-299 4 8 
280-289 1 7 
270-279 6 6 
260-269 12 5 
250-259 11 4 
240-249 8 3 
230-239 2 2 
220-229 0 1 
210-219 3 0 



50 235 

If we had taken AO near the center of the distribution we would 
be following the so-called guessed average method, a method which 
has the advantage of smaller d values but has the disadvantage of 
both negative and positive d’s 

Parenthetically, it might be pomted out that the use of 4ihe 
arbitrary origin, step-interval scheme is analogous to using coded 
scores. If we regard d as a coded value we see from X = AO + zd 
that d = (X ■“ A0)li, or that in general we have a coded score 
Xc = (X — X)/A, with K and k so chosen as to give coded values 
rangmg from zero to between 10 and 20. Then the mean of the 
original scores is given by ilf = K + k times the mean of the coded 
scores. 

The beginmng student who is puzzled about which measure to 
use, the median or the mean, diould remember that the purpose 
of measures of central value is description. When one is attemptr- 
ing to reduce a mass of scores or a distribution of measures to a 
few descriptive constants, the mean and median are both descrip- 
tive terms which more or less adequately depict the “average” 
or typical score, and the choice between the two is frequently 
determined on the basis of which is more typical. Thus, if 6 men 
run 100 yards in 9 6, 9,7, 9.8, 9.9, 10.0, and 14.0 seconds, the mean 
value of 10.5 is not as typical as the median value of 9.86. In 
general, the mean is not as typical as the median when there are 
extreme measures in one direction. However, when the scores are 
distributed in an approximately symmetncal fashion, the mean 
and median will be equal or nearly so, and either will be as typical 



Quartile Deviation 


19 


as the other. The mean m this case has 2 distmct advantages over 
the median: ( 1 ) It is usually a more stable measure in the samplmg 
sense, i e., if we regard our scores as based on a sample of N in- 
dividuals and then take another sample, the means of the 2 samples 
will in general show closer agreement than the 2 medians. This 
point will be discussed in more detail in the chapter on samplmg 
errors (2) It can be handled arithmetically and algebraically 
The student should prove that, if the mean ot JVi cases is Mi, and 
of N 2 cases is M 2 , the mean of_the 2 groups combined will be 
given by 


Me 


NiMi + N2M2 
N1+N2 


The median cannot be handled in such a fashion. Furthermore, 
the mean is used m connection with more advanced topics in 
statistics, whereas the median is seldom mentioned Thus, unless 
the distribution is markedly skewed, the mean should be used. 
The problem of descnbing skewness will receive consideration 
after measures of variation have been discussed. 

As exercises, the student should show algebraically or to his own 
satisfaction by numerical examples that ( 1 ) if a constant is added 
to or subtracted from the scores of a group, the new mean will be 
M + C OT M — C, where C is the given constant and M the mean 
of the original scores; ( 2 ) if all the scores aie multiplied by a con- 
stant, C, the new mean will be CM, whereas dividing by a constant 
will lead to M/C as the new mean. 


MEASURES OF VARIATION 

The description of the extent of scatter (or cluster) about the 
central value may be obtained by any one of several measures. 
These measures differ somewhat m mterpretation and usefulness. 
One may doubt whether the range (highest to lowest score) is of 
sufficient value in psychological research to justify its use as a 
measure of variation. It is, obviously, determined by the location 
of just 2 individual measures or scores and consequently tells 
us nothing about the general clustermg of the scores about a cen- 
tral value 

Quartile deviation. An easily computed description of dis- 
persion is the quarUle demotion (Q), defined as (Q 3 — Qi)/2, in 



20 Describing Frequency Distributions 

which Qs (or the third quartile) is the point above which one-fourth 
of the cases fall and Qi (or the first quartile) is the point with 
thiee-fourths of the cases above. Qo (or the median) has already 
been defined as the point above which one-half of the cases fall. 
The computation of the 2 quai tiles Qs and from grouped 
data IS essentially the same as that of the median. For instance, 
in deteimining the thud quartile we count up to the interval in 
which the pomt falls which divides the number of cases into 2 
parts, three-fourths below and one-fourth above. The distance 
mto this interval is found in exactly the same manner as in com- 
puting the median Since the quartiles are not influenced by ex- 
tremes, it is customary to use them along with the median By 
definition, 50 per cent of the cases fall between the first and third 
quartiles, but in nonsymmetrical distributions it is not likely that 
the limits indicated by the median plus and minus Q will include 
50 per cent It would seem better to report both the first and 
third quartiles, instead of Q, as these values along with the median 
will enable one to pictuie whether or not the clustering above the 
median is different from that below the median. 

Percentiles. Closely allied to the quartiles are the percentiles. 
The Pth percentile is defined as a point below which P per cent of 
the cases fall. Thus the median is the 50th, the third quartile 
the 76th, and the first quartile the 25th percentile The 10th, 
20th, • 90th percentiles are sometimes called deciles. The com- 
putation of the percentiles from grouped data is accomplished in 
the manner indicated for computing the quartiles. The location 
of the zeroth and 100th percentiles is always perplexing. Since 
these 2 points are dependent upon the location of just 2 scores 
(i.e., are greatly influenced by chance), they are difficult to inter- 
pret Common sense would suggest that the concept of these 2 
percentiles be dropped 

Percentiles may readily be associated with the cumulative fre- 
quency distiibution, and with the ogive curve if cumulative per- 
centage frequencies (obtained by dividing the fs by N) are used 
along the ordinate when plotting the ogive. In fact, the ogive may 
be used as a graphic scheme for determining score values corre- 
spondmg to given percentiles For instance, if we wish to obtain 
the 25th percentile point, we find 25 on the ordinate scale, proceed 
horizontally to the ogive curve, then vertically to the x axis, and 
read off the score correspondmg to the 25th percentile. Scrutiny 



The Standard Deviation 


21 


of Fig 4, p. 10, will help the student undeistand the prjpcess. 
Could we also use the ogive as a basis for determining the percentile 
value of a given score? 

The use of the difference between percentiles as an indication 
of dispersion should be obvious In fact, the 10th~90th percentile 
lange is a somewhat better (more stable from sample to sample) 
measure of dispersion than the quartile deviation Percentiles, 
however, are chiefly of value in reporting the scores of individuals 
on psychological and educational tests Oidinarily a raw score 
gives no inkling of what it means, whereas when it is said that an 
individual scores at or near the 85th percentile, the implication 
is that 15 per cent of his fellows score higher or better than he 
Thus a percentile score carnes with it some idea of the location of 
the individual with reference to the group Furthermore, per- 
centile scores for entirely different tests are comparable if derived 
from the same group or sample. The original raw scores might be 
different units, e g , number of additions per minute and time to 
read a page of prose, and consequently not at all comparable 

The average deviation. Sometimes called the mean deviation 
or mean vanation, the (werage deviation {AD) is defined as the 
average of the deviations of the several scores from the mean. 
Thus, if a; = X — il/, then AD = S | a; |/iV', where | a; | is the 
absolute value of x, i e., the negative deviations are treated as 
though positive Currently the average deviation is seldom used, 
the student, however, needs to know somethmg about it if he reads 
the earlier research literature in psychology 

Contrasted with the quartile deviation, the average deviation 
gives weighs to extremes, and for the usual bell-shaped distribu- 
tion the limits M plus and minus AD will include about 57.5 
per cent of the cases, the average deviation is larger than Q but 
not so large as the standard deviation, to which we now turn. 

The standard deviation. A third measure of vanation, the 
standard deviation (SD or o-), is defined as 


or = 



(4) 


where x ^ X — To compute the standard deviation directly 
from this fonniila would be very cumbersome and uneconomical, 
since x will usually involve decimals A computational formula 



22 


Describing Frequency Distributions 

mvolying deviations fiom an aibitrary origin (AO) can be easily 
derived by algebra. Such a derivation is included here in order 
further to familiarize the student with the method ot handling 
summation signs The deiivation will be carried thiough tor o^, 
techmcally known as the vanance, then at the end we can take the 
square root to obtam a. 

From formula (4) we have 



in which x = X — M 
As m deiiving formula (3), we can set 

X = AO + id 


and since M 
X-M, 


AO + i(Sd/N), we have, substituting in x s 


x-= AO + id - 



= td — tc 


where for convenience we let e stand for "Zd/N. 

^ = {id- icf = i\d - c)* 
Zi? = i^Z{d - cf 

= %\Z^ - 2cZd + Ni?) 
Dividing both sides by N, we have, 



hence 

«r = - VNZd^ - (Mf 

N 

But since this form does not make explicit the fact that each d, 
and each c?, must be summed as often as it occurs, we will insert f 



The Standard Deviation 25 

Replacing the last S by iNT (we are summing ilf® N times) and 
replacing M by "ZX/N, we have 

, , SZ /SZ\2 

Sa^ = SZ® - 2 ZX + N’l ) 

JV ViV/ 

_ NZX^ - 2(SZ)2 + (SZ)* 

N 

Sa^ = i [iV^SZ® - (SZ)2] (6o) 

N 

Substituting in formula (4) leads to an JNT* in the denominator, 
which can be brought out as 1/N, hence we have 

ff = —Vnzx^ - (sz)2 '■ ^ 

. N 

All the scores are simply squared and then summed to get SZ^, and 
SZ has the same meaning as in formula (2) 

Although a mean computed by formula (3) from grouped data 
will not err systematically from the value obtained by formula (2), 
the use of formula (5) for calculating o- tends to give a value which 
IS too large when compared with the nonapproximate value yielded 
either by (4) or by (66). The reason for this is easily explained at 
the blackboard — we give here a hint. In geneial for an interval 
below the mean there will be more scores above than below the 
midpoint of the mterval, while for an interval above the mean 
there will be more scores below than above the midpoint. Thus m 
taking the several midpoints as representing the scores within the 
several intervals we are m effect using values which deviate too far 
from the mean 

We may Qprr^t for the systematic error involved in using 
formula (5) by substituting in 



The ^^/12 is known as Sheppard’s correction for grouping. The 
imcorrected and corrected values differ but little when 12 or 1ft 



26 


Describing Frequency Distributions 

intervals have been used, and as the number of intervals is in- 
creaced, the difference becomes smaller and smaller If less than 
10 intervals have been used, the error may be appreciable and the 
correction diould be applied. These considerations form the basis 
for the suggested rule that at least 10 or 12, and not more than 20, 
intervals be used. 

Regarding the interpretation of the standard deviation, it can 
be said that, when we have the usual symmetrical bell-shaped 
distnbution, about 68 per cent of the cases will fall betvreen the 
limits plus and minus lo- from the mean, about 95 per cent between 
plus and mmus 2ff, and nearly all the cases (99 73 per cent) be- 
tween plus and minus 3<r. The standard deviation, even more than 
the average deviation, gives weight to extremes and therefore may 
not be as good as the quartiles for descnbing the dispersion. The 
standard deviation has decided advantages over other measures 
of dispersion* (1) Typically, it is more stable from the sampling 
viewpoint. (2) It can be handled algebraically, i e , if we have 2 
groups of Ni and N 2 cases, with Mi and M 2 , and vi and vg, as the 
respective means and standard deviations, we can obtain the 
standard deviation for the 2 groups combined by 


Vc = 

where the subscript refers to the combined group. The mean for 
the combined group can be obtained by a formula given on p. 19. 
Formula (8) can be extended for determining the standard devia- 
tion for 3 or more groups combined. The student can make this 
extension as an exercise. (3) The standard deviation is a mathe- 
matical tenu which has considerable importance in more advanced 
statistical work. It is usually involved in the determination of 
sampling errors and is the measure of variation used in the analysis 
of variation and in connection with correlational analysis. There- 
fore, unless there are definite reasons for not using it, the standard 
deviation, instead of the average deviation or Q, should be used as 
a description of the amount of dispersion. 

As an exercise, show that, if a constant is added to or subtracted 
from each of a set of scores, the standard deviation does not cbn.Tig ft , 
and that multiplying or dividing each by a constant will lead to 




Measures of Skewness and Kurtosis 


27 


Cff or a/C, respectively, as the new standard deviation, where a 
stands for the sigma of the original scores and C is the conAant. 


MEASURES OF SKEWNESS AND KURTOSIS 

If a distribution is not of the symmetrical bell-shaped type, it is 
not sufiScient for descnptive purposes to report only the mean and 
standard deviation We also need a measure of the lack of sym- 
metry, i e , of skewness, and frequently it is desirable to descnbe 
the distnbution stUl further by givmg a measure which indicates 
whether the distribution is relatively peaked or flat-topped, i.e., a 
measure of kurtosis 

Skewness can be described roughly by a number of measures, 
such as the difference between the mean and median divided by 
the standard deviation, or in terms of quarties or percentiles If 
an adequate and stable description of skewness is desired and if a 
measure of kurtosis is also needed, a method based on moments is 
to be preferred 

The first 4 moments about the mean are defined as follows: 


^---0 


U2 = 


Uz = 


W4 = 


If 

If 

Sa;^ 

If 


( 9 ) 


where x represents the deviation of each score from the mean of all 
the scores. For purposes of computation, the moments about an 
arbitrary origin can be determined, and then from these values we 
can obtam the moments about the mean This procedure has 
already been employed in computing the standard deviation; i.e , 
we took deviations from an arbitrary origm. [The definition of 
the standard deviation, formula (4), was m terms of deviations 
from the mean ] If we use v to represent moments about an arbi- 
trary origm, the fir st 4 moments about AO can be defined as 



28 Describing Frequency Distributions 

follows, where d is the score deviation from AO in step units: 


»i 


S/d 


»2 


S/cP 

~ir 


»3 


O 4 


S/d» 

N 

Zfd* 
N . 


( 10 ) 


When the »’s have been calculated, the m’s can be readily deter- 
mined from the following relationships. 

= 0 

U2 = — t>®i) = O’® 

Us = — 302»i + 2»®i) 

1*4 == l*(»4 — 403 »i + 6»2»®1 — 3»^i). 


The student should note the similarity of the formula in (11) for 
the second moment to that given for the standard deviation 
[formula (5)] 

A measure of skevmess defined in teims of moments is 

9i = - (12) 

t*2’Vt*2 

For symmetrical distributions the value of gi will be zero, hence 
the departure of gi from zero can be taken as a measure of skew- 
ness. The deviation of gi from zero, however, must be eonsiderecl 
in light of the operation of chance or in terms of sampling errors 
(to be discussed later) The skewness is said to be positive when 
gi is positive and negative when gi is negative. 

The degree of kvrtom can be described by 

fe = O 82 - 3) = ^ - 3 
W 2 


.m 



Measures of Skewness and Kurtosis 


29 


When g 2 is less than zero, the distribution tends to be flat-topped, 
whereas for ^2 greater than zero it is relatively steep or peSked 
When both gi and ^2 are zero or near zero, the distnbution is of the 
usual symmetncal bell-shaped type, which is referred to as the 
“normal” frequency distribution 

Formulas ( 12 ) and (13) also define /3i and 182 , which have been 
and are still used as measures of skewness and kurtosis Recently, 
the g measures have come mto usage because of certam advantages 
which need not be discussed here 

It will be noted that the measure of skewness involves taking 
the third moment relative to or® (smce U 2 = tr^), and that the 
measure of kurtosis depends upon the fourth moment relative to 
For a given distribution, all the values of W 2 , ^ 3 , and W 4 are 
in terms of the same measurement unit, say inches or pounds or 
IQ’s or minutes, hence the ratios in formulas ( 12 ) and (13) are 
pure numbers, 1 e , are not inches or pounds or IQ’s or minutes 
If we have the distnbution of the weights and of the heights for 
1000 individuals, the measure of skewness for the height distribu- 
tion may be compared directly with that for the weight distribu- 
tion This IS true by virtue of the fact that for each we are express- 
ing the third moment relative to the amount of variability, both 
in inches for one distribution, both m pounds for the other Like- 
wise, it can be reasoned that the measures of kurtosis for different 
distnbutions are comparable, although the distributions involve 
different measurement units 

In order to help the reader visuahze the meaning of different 
values for gi as associated with different degrees of asymmetry, 
Fig 6 has been prepared. 

When we have determined the mean and the second, third, and 
fourth moments, and from the moments have derived expressions 
which tell us the degree of dispersion, skewness, and kurtosis, we 
have a description which is adequate for most distributions. These 
measures can be used to determine the type of mathematical equa- 
tion which will fit an observed frequency polygon, i.e., we can 
write the equation of a frequency curve which fits the observed 
frequency distribution A distribution frequently found in psycho- 
logical research is of the “normal” type, which is sufficiently 
described by the mean and standard deviation Ordmarily it is 
not necessary to compute gi unless the distnbution “appears” 
to be skewed or to compute g 2 unless the distnbution seems peaked 



30 


DetM’ribing Frequency Distributions 

or flat. The nature of the research, the type of variable being 
studied, and also the size of the sample are factors which need to 
be considered in making a decision as to the necessity for computing 
mesisures of skewness and kurtosis. It is seldom advisable to com- 
pute these measures when N is less than 100. 

The student should be apprised of the fact that the rather fre- 
quent occurrence of symmetrical distributions tor psychological 



Fig. 8. Polygons with diffeient degtees oi skewness 


variables may result from an artifact, and also that the occurrence 
of a skewed distnbution may likewise be artifactual. This is true 
because very few of the instruments used in psycholo^cal “meas- 
urement” involve equal unit scales — ^the measuring units are 
frequently arbitrary or even accidental Many of the variables 
are measured simply in terms of the number of items checked or 
the number of items correct. The shape of the resulting distri- 
butions is largely determined by the percentage checking the items 
or by the difficulty of the items. If the items are of medium diffi- 
culty for a group, it can be eicpected that the scale will yield a 
symmetrical distribution when applied to the group; if the items 
are easy, the scores will pile up toward the top (give n^arive 
skewness); if difficult, a pilirg up toward the bottom will occur. 



Measures of Skewness and Kurtosis 


31 


In the absence of equal scale units for the measuring devices one 
cannot really say whether the distribution of, for example, Arith- 
metic ability for a given group is symmetrical or skewed — all that 
can be said is that in terms of the units used the distribution has a 
particular shape 

From the foregoing it would seem that, since skewness (and 
kurtosis too) IS partly a function of the accidental nature of the 
measuring umts, the descriptive measures of shape would have 
little value in psychology The fact remams, however, that some- 
times it IS desirable to specify the skewness and kurtosis of a di&- 
tnbution of scores merely as a part of the description of what 
happens when a scale of measurement, however arbitrary the 
units, IS applied to a given group Furthermore, it is to the stu- 
dent's advantage to know something of measures of skewness and 
kurtosis because we shall later have occasion to refer to them, and 
because he is apt to encoimter them in more mathematical treat- 
ments of statistics. 



CHAPTER 4 


Distribution Curves 


By successive smoothing of a polygon (or distribution), one can 
iron out irregularities until the polygon becomes a “smooth” or 
regular and uniform curve. We can think of this curve as being 
amilar or nearly identical to what we would obtain were we to 
increase indehnitely the aze of our sample and at the same time 
use smaller and smaller groupmg intervals. That is, the limit of a 
polygon, as we allow N to approach infinity and the interval aze 
to approach zero, is conceived to be a curve which is smooth and 
r^lar. Now such a uniform curve can usually be described in 
terms of a mathematical equation. The student may recall that 
the general equation for a straight line is y = ax + b, and that 
y = 2a: + 3 is the equation for a particular line, that 3 ^ + ^ = a? 
is the equation for a circle of radius a with the on^ or intersec- 
tion of tile abscissa and ordinate at the center, also that 
y = a + bx + ca^ is the general equation for a parabola. It is 
not until we give specific numerical values to the constants that 
we have equations for particular curves. 

Prequenqy curves can be thought of as representing the rda- 
tionship between two variables: y, or the hei^t of the curve, and 
X, the variate or variable under consideration Prequeney polygons 
or distributions, even when smoothed, may be of various shapes: 
S3unmetrical or skewed, flat-topped or steep, humped near the 
center or at one end, bimodal or unimodal, J-shaped or U-shaped, 
fall in g off gradually or suddenly, etc. A complete description of a 
frequency distribution is obtained when we have succeeded in 
writing the equation of the curve whidi “fits” the distnbution. 
The type of curve to be fitted is chosen on the baas of certain 
criteria which are derived from the moments and the interrelations 
amcH^ the moments. The late Professor Earl Pearson developed 
the mathematics of a system of frequency curves and classified 

32 



Normal Curve 


33 


distributions according to several “types’* of curves, but a complete 
exposition of these types is beyond the scope of this text 
Normal curve. A bell-shaped cuive which is often approxi- 
mated closely by frequency distnbutions and which is intimately 
involved in much of statistical inference is known as the normal 
curve. We need to know in detail the properties of this curve. 
The general equation of the normal curve can be written as 



(Z-A02 

2<r2 


(14) 


in which y represents the height for any value of the vanable ar, 
N IS the number of cases, <r is the standard deviation, M is the 
mean of the distribution, and t (3 1416) and e (2.7183) are well- 
known mathematical constants In order to wnte the equation 
of a particular normal curve, i e , one which corresponds to a par- 
ticular distnbution, we need to know W, M, and <r This is the 
basis for the fact that, when we have the usual bell-shaped distri- 
bution, we need only the mean and standard deviation to descnbe 
it adequately. But in order to say that a given distnbution is 
really normal, it is necessary to show that the gf’s (as defined on 
p. 28) are zero or approximately zero. 

Refernng agam to equation (14), we note that the numerator 
part of the exponent could be 'written m terms of deviation units, 
i.e., 'with X instead of X — M . The y for a positive deviation of, 
say, 10 will be exactly the same as for a negative 10 for the simple 
reason that the de'viation value in "the formula is squared. This 
mdicates that the normal curve is symmetrical about the mean, 
and hence the mean and median coincide When a: = 0, i.e., when 
we take X = M,y has its maximal value, and therefore the mean 
and mode coincide. For values of x other than zero, the height of 
the curve will be less. This is evident when we consider the fact 
that the exponent in equation (14) is negative The height of the 
curve as we go m either direction from the mean becomes less and 
less (see Fig 7a). This droppmg off is slow at first, then rapid, and 
then slow again. If we take the maximum value of y (i.e., at 'the 
mean) as unity, “hhe ordinate at 'the pomt .5<r from the mean is 
about .883; at l<r, about .606; at 2<r, .135; and at 3cr, .011. As we 
go still farther from the mean, the value of y becomes smaller, 
and as x approaches infinity, y approaches zeaeo (asymptotic). 



34 


Distribution Curves 


TheoreticaJly, the curve never touches the base line, but so far as 
empilical distributions are concerned, y does become zero- 
For both the frequency polygon and the histogram, the frequency 
for a given interval is represented along the y axis or ordinate, but 
for smoothed curves and for mathematical curves such as that 
defined by equation (14), it is advantageous to regard the area 
under the curve for a particular grouping mterval on the x axis as 
mdicatmg the frequency for that mterval. Accordingly the total 



-3(7 -2(7 -<7 0 +ff +2v +3v 

Fig 7a Noimal curve. 


area under the curve corre^onds to the total frequency, or N, and 
the area under any ^ven part of the curve, i.e., the area between 
any two x values, can be expressed as a percentage of the total. 
For example, the area included between the mean and the point 
on the base line 1(7 above the mean is 34.13 per cent of the total, 
and the area between plus and minus lo- is 68 26 per cent. The 
latter percentage has already been given on p. 26 as one way of 
interpreting the standard deviation. The limits plus and minus 
2(7 will include 95.45 per cent; plus and minus 3(7, 99.73 per cent, 
and plus and minus 4(7, 99.9936 per cent. Theoretically, one must 
pass to plus and mmus infinity to include all the area, but in prac- 
tice 100 per cent of the cases will usually fall within the limits 
±3(7, and nearly always within the linuts ±4(7. 

When we transform a set of scores to the so-called sti&daicd score 
form 

X-M X 

ff (T 



Normal Curve 


35 


we have each score expressed as a deviation from the mean in terms 
of multiples of the standard deviation of the original distribution. 
It can easily be shown that the standard deviation of our new set 
of scores will be unity, and the mean zero. The frequency polygon 
for the standard scores will have exactly the same shape as that 
for the original scores, this transformation is equivalent to trans- 
lating the origin along the x axis to the point corresponding to the 
mean and changing the scale on the x axis so as to make the stand- 
ard deviation equal to unity. If we let the total frequency be 
unity, we can think of the total area under the curve as being 
unity. This is equivalent to saying that N equals 1, and since with 
standard scores <r also equals 1, equation (14) can be written as 


y = 




(16) 


The value of 1/V^ is about .39894, and therefore at z = 0 
(i e , at the mean) y wiU equal 39894, which is the maximum y for 
the normal curve of umt area and unit standard deviation. The 
ordinates for other values of z will be less. For instance, at 
dzlz, y = .24197, and at db2z, y = .05399. 

The percentage area imder any part of the curve can be dete2> 
mined by methods of the calculus. The area under the curve be- 
tween any two values, zi and Z 2 , is obtamed as the value of the 
integral 


A — f ydz (17) 

Jzi 

Perhaps this expression will be more meaningful to the student 
who has not studied mtegral calculus if the given area is regarded 
as composed of a large number of strips, each havmg a tiny base 
dz and a height of y. For each such strip the area wiU be nearly 
ydz, and the mtegral sign in formula (17) simply means the “sum 
of'' the areas of these tiny strips. 

The student of the calculus wiQ also note that the first derivative 
of either equation (14) or (16) set equal to zero and solved wiU 
yield a maximum for the curve when a; or z equals zero, thus prov- 
ing more rigorously that the mean and mode comcide. If the 
second derivative is set equal to zero and solved for x or z, it will 



36 Distribution Curves 

be found that the points of inflection of the curve are located where 
* is :£<r or z is ±1 

Normal curve table. Because of the widespread use of the 
normal curve, tables of proportionate frequencies and ordinates 
for various z or xia values are available. The student need not be 
able to integrate equation (17) in order to understand a table of 
the normal curve functions Table A of the Appendix contains 
four columns, the first of which is a or ar/v values The second 



Fig 7b Normal curve functions 

column id'^es the area of the curve from the mean out to the corre- 
^nding z value, this area being the same whether z is positive or 
negative; a given z divides the curve into two parts, and the third 
column gives the area of the smaller part The area of the larger 
part can be obtained by addmg .5 to the entnes in column 2. If 
one wishes to determme the proportionate area between plus and 
minus a given z, the values in column 2 should be doubled. The 
fourth column gives the y or ordinate for each of the z values. For 
purposes of reference, the meanings of the several entries in Table 
A are illustrated m Fig. 76, m which an ordmate (dotted) has been 
erected at an x/<r value of , +.8. The area from the mean to +.8 
is found from column 2 as .28814; the area below this point is 
.78814, and that above is .21186, of the total area. Note that 
.78814 plus .21186 equals unity and that .78814 is .50000 plus 



Standard Scores 37 

.28814 The height of the curve at 2 = 8 is found from column 4 
as 2897, whereas the maximum height of 3989 is at the m^to. 

It is frequently useful to know the relationship between the 
various measures of dispersion for a normal distnbution It can 
be shown that the following hold true. 

8453 Ai>= 6745 SD 
AD = 1 1829 Q = 7979 SD 

SD = 1.4826 Q = 12533 AD 

It IS also useful to know that for an JV of 50 the SD will be about 
one-fifth the range, that for an iV of 200 the SD will be about one- 
sixth the range, and that for an iV of 1000 the SD will be about 
one-seventh the range 

The tabled values for the normal curve are often used in connec- 
tion with problems similar to the following: If a distribution of 
the heights of men is normal with a mean of 68 0 inches and a 
standard deviation of 2 5, what percentage of men are more than 
6 feet tall? We find z as the difference between 72 and 68, divided 
by <r, or 2 = 1 6 ; then from Table A we find the percentage of cases 
which fall above this z value to be 5 48. Suppose that the mean 
IQ of 10-year-old boys is 100 and the standard deviation 16. What 
percentage have IQ's between 90 and 110? What percentage of 
10-year-old boys would be classified as ^^gifted" (IQ above 140)? 

The student will have noted that the answers to problems similar 
to the foregoing are possible by virtue of the fact that the areas 
and ordinates of Table A are for the standard score form of the 
normal curve with total area set equal to unity. By formula (15) 
one can pass from raw scores to standard scores and vice versa, 
and knowing N one can readily convert proportionate areas to 
frequencies or frequencies to proportions Thus the table can be 
used with any normal distribution regardless of the original meas- 
urement units 

Standard scores. Perhaps it should be pointed out at this place 
that transforming scores, when distnbutions are normal or approx- 
imately so, to standard scores leads to new sets of scores which are 
comparable. For example, inches and pounds are not comparable 
units. If a man is 71 inches in height and weighs 170 pounds, it is 
impossible to say whether he is taller than he is heavy, but when 
the 71 inches is transformed to a 2 of 9 and the 170 poimds to a 2 of 



38 


Distribution Curves 


1.3, we are able to say that, relative to his position in liie two dis- 
tributions, he is heavier than he is tall likewise, the raw scores 
on two psychological tests wiU seldom be comparable; changing to 
standard scores permits comparison, so that one can decide whether 
a boy’s periormance on one test is better or worse than his per- 
formance on another This assumes, of course, a close approxuna- 
tion to normality, and that the means and standard de\’iations 
used in the transformations are based on the same or hi^y similar 
groups. 

Standard scores, as defined by formula (15), will involve both 
podtive and negative values and decimal scores Smee these are 
awkward to use, a further transformation is frequently made in 
such a way as to yield a distribution with a preasagned M and v, 
instead of the if of 0 and <r of 1 which hold for the standard scores 
defined by formula (15). If we widi a distribution with a mean of 
50 and a <r of 10, we can ^ply multiply each z by 10 and add 50. 
Multiplying each z by 20 and adding 100 would yield a mean of 
100 and a o- of 20. Either of these transformations will get rid of 
negative values and penmt a sufficient number of score values 
without the use of decimals. In general, if we wish to transform 
a set of scores having a mean, M, and a standard deviation, c, to 
new values to be called Z’s, with mean equal to any value K and 
<r equal to iS, all we need to do is to apply the relationship 

Z=^z(S) + K, or Z = (S) + K 

which becomes 

S M 

Z = -(X)- — (.8) + K 

tt 9 

The last form is the eaaer to use in practice, particularly with 
a calculating machine. Note that the last two terms will combine 
numerically and therefore can be placed in the lower dial as a posi- 
tive or ne^tive number; then the numerical value of jS/<r can be 
set in the keyboard as a constant to be multiplied in turn upon the 
varying yalues of X. If the machine has a continuous upper dial, 
the best procedure is to multiply by tiie hipest X fibrst, and then, 
witiiout dearii^ the dials, to subtract once for each succestively 
lower value of X. Care is needed in aligning dedmals, a check on 
which can be obtained by multiplying by the X nearest M. This 



Standard Scores 


39 


should lead to a value, in the lower dial, which is near K With 
this setup, one can readily run ofif a table giving the valued of Z 
tor varying values of X 

The comparability of two sets of standard scores, either as g's 
or as Z’s with the same mean (K) and same cr (S), does not hold 
for skewed distributions unless the two distributions show the same 
degree and direction of skewness This is unlikely to be the case 
m practice There is a scheme for use with skewed distnbutions 
which not only leads to comparable umts but which also normalizes 
the distributions, i e., changes the distributions from skewed to 
noimal This procedure is known as T scaling , and the resulting 
scores are known as T scores They are usually so calculated as to 
yield a mean of 50 and a o- of 10, but other values for these con- 
stants are possible The detailed procedure may be found in 
McCalPs Measurement which also mcludes a table for expediting 
the transformation Suffice it to say here that T scaling basically 
involves determining the proportion (or percentage) of cases ex- 
ceeding a given value plus half those reachmg that value, and then 
entering such proportions in a table of the normal curve function 
to find the corresponding z values Standard scores based on a 
normal distribution of original scores and T scores based on any 
shape distribution are comparable, provided they have been so 
determined as to yield the same mean and standard deviation. 
They differ only in the way in which they are computed, the stand- 
ard score being a linear transformation which leaves the shape of 
the distribution unchanged, whereas T scaling changes the distri- 
bution to the normal form. If we begm with an exactly normal 
distiibution and convert the scores to both z's and T% there will 
be a linear correspondence between the two sets of transformed 
scores. If their means and sigmas are set equal, the Z's and T^s 
will be equal to each other. 

It will be recalled that the use of percentiles is another way of 
expressing scores on different tests so as to have comparability. 
The student should give sufficient thought to percentiles and stand- 
ard scores to see how they are interrelated when the ongmal scores 
are normal in distribution. Hint: The tabled functions (Table A) 
of the normal curve may help. The student might also demon- 
strate to his own satisfaction that the difference between the 

* McCall, W A., Measurement, New York Macmillan, 1939, pp 505-^506. 



40 


Distribution Curves 


50th and 60th percentile points is not apt to be equal to the dif- 
ferencTe between the 80th and 90th percentile points 

Kinds of distributions. In anticipation of topics to be dis- 
cussed, it might be well to mention some possible ways of regarding 
frequency distributions We can have an observed , oi sample, distri- 
bution of scores for a group of N individuals; we can imagine a 
population distribution of scores for either a finite or for an infinite 
N] and we can conceive of a distnbution curve defined by a mathe- 
matical equation (or function). Because of chance factors (as yet 
undefined herein) we do not expect an observed sample distribution 
to be exactly like the distnbution of the population from which the 
sample is drawn or like a defined mathematical distribution. 

Since we are seldom able to measure all members of a population, 
we can only assume that population scores follow some defined 
mathematical distribution The form of mathematical curve 
assumed is usually decided upon by a consideration of the shape of 
an observed sample distribution As w'lll be seen later, the reason- 
ableness of the assumption can be checked statistically 

It is possible, however, to show mathematically that under 
prescribed conditions given measures will follow a defined dis- 
tribution curve exactly. We shall lefer to such a distribution as 
theoretical or expected Strictly speaking a mathematical distribu- 
tion cun^e holds only for a continuous variable If we had the 
distribution for a discrete variable, such as number of children 
per family, we would never expect that increasing N would produce 
a curve — ^the variable takes on only point values 0, 1, 2, etc., 
hence we cannot allow the interval size (see p. 32) to approach 
zero, which is necessary for a smooth curve. 

As implied above, there are distnbution curves which are not 
normal. We shall introduce other curves (or functions) when 
needed Thus far, the normal cuiwe has been discussed as a 
frequency curve, and the area interpretation has been in terms of 
the number of individuals or percentage of cases falling between 
certain score limits This same curve is often spoken of as the 
normal probability curve, and as such it is regarded as a theoretical 
curve. We shall see, moreover, that there are theoretical curves 
other than the normal curve which may be regarded as probability 
curves. 



CHAPTER 5 

Probability and Hypothesis Testing 


Statistical inference and the testing of hypotheses involve the 
concept of chance, or probability. A simple example will serve to 
illustrate the probabilistic nature of hypothesis testing. Suppose 
a chap claims that he can distmguish between Camels and Lucky 
Strikes To test his claim we could blmdfold him and present him 
with either a Camel or a Lucky Strike (the brand to be presented is 
determmed by tossing a com) If on this 1 tnal he correctly named 
the brand we would not be inclined to accept his claim since he 
would have a 50-50 chance of bemg correct on a sheer guessing 
basis So we give him a second trial (again, and for any subsequent 
trials, we toss a com to determine which brand to present to him) 
If he were agam successful we might give some credence to his 
claim but someone might ask whether making 2 correct dis- 
criminations could happen on the basis of chance We shall 
presently see that the chances are 1 in 4 of getting 2 correct, i e., 
success on 2 trials could easily occur on the basis of chance. 

But suppose he is correct on 3 trials, then on the fourth tnal, 
and also on the fifth, or perhaps he is correct on 10 tnals, or perhaps 
on 9 of 10 trials^ Regardless of the number of tnals and the 
number of successes we certainly should have some information 
about chance success, or the probabihty of correctly naming the 
brands on the basis of chance guessing, before we reach a decision 
regarding the claimed ability to distmguish between the 2 brands 
of cigarettes. This and similar decision problems involve notions 
of probability, to which we now turn. 

41 



42 Probability and Hypothesis. Testing 

Probability. If one had a box containing 70 white and 30 black 
hn.11S^ well mixed, and were to draw 1 ball at random, the chance of 
the drawn ball’s being black is said to be 30 out of 100, and the 
chance of its bemg wlute would be .70 This can be interpreted 
to TYiftan that, if we made 1000 landom draws, each time replacing 
the drawn ball and remixing the contents of the box, the percental 
of black balls drawn would be about 30, and of white diaws about 
70. If one rolls a die, the probabihty of obtainmg a 4 is i e , a 
large number of rolls would yield a 4 about f of the time. If one 
tosses a symmetrical coin, it is usually said that there is a fifty- 
fifty chance of its landing “heads up,” or the probabihty of a head 
is ^ This is another way of saying that in the long run the pro- 
portion of times that the coin lands as a head will be the same as 
the proportion of tunes it lands as a tail. 

These very simple examples illustrate a definition of probabihty: 
if an event can happen in .1 ways and fail in B ways, all possible 
ways being equally likely, the probability of its occurring is 
A/ (A + B) and of its failing is B/Ia + B). That is, a probabihty 
figure is the ratio of the number of favorable events to the total 
number of events, and it is therefore necessary that we be able to 
enumerate events in order to arrive at a probability figure. 

If we draw a card from a pack, the probability of obtaining a 
^ade is J, and the probability of drawing a club is also f , but the 
probability of drawing either a spade or a club is f plus f , or §. 
If we roll a die, the probability of obtaining either a 4 or a 5 is ^ 
plus or These two atuations illustrate the gMition theorem 
of probability: the probabihty that erdher one event or another 
event wiU happen is the sum of the probabilities of their occur- 
rences as dn^e events, (ihe events must be mutually exdusive; 
i.e., if one occurs, the other cannot.) 

If we rbU a pair of dice, the probability of a 2 on the first and 
a 5 on the second is ^ times or If we toss 2 coins, the 
probability that the first will land a head and the second a head is 
^ tunes or which is, of course, the probability that both will 
land as heads. Notice that the result obtained with the second 
die or coin is indepeadent of the outcome of the fiust die or coin. 
These two examples illustrate the mi^pUcation theorem: the 
probability of 2 (or more) indey iendent ^en-^ * 'occurring simul- 
taneoudy or in succession (one and the other) is the product of 
thdr separate probabilities- 



Binomial Distribution 


43 


As jxist indicated, if one tosses 2 coins, the probability that the 
first will land a head and also the second a head will be f tunSs 
or which is the probability that both will fall as heads. The 
probability that the first will land a head and the second a tail will 
also be § times or But 1 head and 1 tail can be obtained in a 
manner mutually exclusive to the above; i e., the first can land as a 
tail and the second as a head, and this combination or event has a 
probability of J, whence the probabihty of obtaining 1 head and 
1 tail will be J plus or This same result can be arrived at 
by listing all the possible combinations and taking the ratio of the 
number of favorable to the total number of possible combinations. 
The possible combinations are HH, HT, TH, TT, from which we 
see that 2 out of the 4 posable events are favorable for the occur- 
rence of 1 head and 1 tail. We also note that 1 out of 4 is favorable 
to 2 heads. 

Suppose we were to toss 3 coins; we would have the following 
possible combinations: 

CJoiN IHHHSTTTT 
Coin 2HHTTHHTT 
Coin ZHTHTHTHT 

The total number of possible “events” is 8, 1 of which is favorable 
to 3 heads, 3 to 2 heads, 3 to 1 head, and 1 to no heads, thus giving 
the re^ective probabihties of f f , and f . If we were to toss 
4 cdns, we would have the following probabilities: 


4 heads 


1 head 

A 

3 heads 

tV 

0 head 

1^ 

2 heads 

A 




The student should satisfy himself that these are the correct fig- 
ures by writing down all the combinations postible and counting 
those favorable to any particular number of heads 
Binomial distribution. The process of determining postible 
combinations becomes quite laborious for, say, 10 coios, but the 
several probabilities can be obtained by the coefficients in the 
expansion of the binomial (a 5)“. Thus for « = 2 (i e., 2 coins) 
we have a® + 2db -t- 6®, or 1, 2, 1; for n = 3, o® 3a®6 -f 3ol)® 
-b b®, or 1, 3, 3, 1; for » = 4 the coefficients are 1, 4, 6, 4, 1. In 
each case the sum of the coefficients, 2”, will be the total possible 
combinations, and the coefficients taken as ratios with the common 



44 Probability and Hypothewss Testing 

denominator, 2", will represent the probabilities for ti, » — 1 , 
n — 2, • • ■ 0 heads. 

The student may recall that the general expansion of (o + 6)” is 


a” + m”-^b + 


n(n — 1) 
1X2 




n(n — l)(n — 2) 
1X2X3 


gn-353 .j 


This expansion will contain (» + 1) terms and vnll terminate in 
6 ”. For n ~ 10, we have the folloivii^ coefficients: 1, 10, 45, 120, 
210, 252, 210, 120, 45, 10, 1, -which sum to 1024, or 2 to the tenth 
power. Thus the probability that all 10 coins will fall as heads is 
1/1024; 9 heads, 10/1024; etc. If we plot these values as a fre- 
quency polygon — these coefficients are frequencies in the sense 
that they represent the expected number of times for 10 heads, 
9 heads, etc , out of a total of 1024 tosses — ^we will have a bell- 
shaped graph which wall resemble somewhat the normal curve. 

Another and more useful way, for our purpose, of considering the 
binonual expansion is to use p and q, in the place of a and b, with 
p defined as the probability of success on a angle element and q 
as the probabihty of failure, or 3 = 1 — p. Thus we would liave 
(p -H 3 )“. Suppose n — 2; the expression would be p® -f- 2 p 3 •+■ 3 ® 
If p = -I, as in the com atuation, this would give (^)® 2 (|)(f) 

+ or -f-j arid i as the probabilities for secunng 2 heads, 

1 head, and 0 head respectively. Each term is itself a probability 
fraction; the numerators are 1 , 2 , and 1 as before. For n = 10, 
we would have or 1/1024, 10(i)*(^) or 10/1024, 45(^)®(|)® 
or 45/1024, etc., as the probabilities for obtaining 10 heads, 9 
heads, 8 heads, etc. 

The chief advantage of using the p and 3 notation is that we can 
readily see what happens when p is not equal to Consider the 
expectation when we roll a pair of dice with “success” defined as 
tibie rolling of “snake eyes ” We would have (p + 3 )® — -I-)® 

= -^ + 2 (i^) + as indicating the probability of obtaming 

2 one-spots, 1 one-spot, and 0 one-spot. If 3 dice were rolled, we 
would have ■+- 3 (tj|- 5 -) -|- 3 (^ 1 ^) + or ■ 5 ^, 

and as the respective probabilities for 3, 2 , 1 , and 0 one-spots. 
The important thmg for the student to note is that these probabili- 
ties are defimtely skewed — ^not all probability distributions are of 
the symmetncal t 3 q>e. The student can, as a tedious exercise, work 
out the probabilities for 4, 5, 6 , 7, and 8 dice, and therefrom learn 



Binomial Distribution 


45 


that the shape of the distnbution changes from marked skewness 
to less and less skewness as the number of dice is mcreased.* It 
can be easily shown that, if p = |- and q = h the skewness will 
be in the opposite direction. Another proposition which the 
student can demonstrate to himself is that, for a fixed n, the 
skewness increases as p is taken farther from § in either direction — 
extremely small or extremely large p's (near unity) lead to very 
marked skewness. 

The binomial expansion provides the probabihties of the theo- 
retically expected frequencies for given n's, p's, and g's. Such 
theoretical distributions can be described as to central value, varia- 
tion, skewness, and kurtosis. The numerical values for these 
measures may be obtained by direct computation from the distri- 
butions built up by the bmomial expansion, or these measures may 
be obtained by simple formulas, which can be denved by simple 
algebra, without havmg the actual distributions available. 

The student can, as an exercise, perform an empirical check on 
the formulas for the mean and standard deviation. The formulas 
are: 

M np 




(skewness) 


1 — 6pg 

02 = (kurtosis) 

npq 


It should be noted that n is the number of elements, not the 
number of cases The formula for ske^vness permits several deduc- 
tions. When p = i, g also equals §, and hence the skewness is 
zero; the degree of skewness for a fixed n depends upon the devia- 
tion of p from §, i e., the smaller or the larger the probability of 
success for each element, the more skewed the distribution. Note 
also that, since n is in the denominator, the larger the number (w) 
of elements, the smaller the skewness for fixed values of p and q 
The above formulas describe the theoretically expected distribu- 
tion for given n% p's, and g's As will be seen later, any empirical 
distribution obtained by tossing 10 coins or rolling 3 dice will 



46 


Probability and Hypothesis Testing 

yield values which, for reasons to be discussed, ivill only approxi- 
mate these values 

It is of interest to consider plotting the binomial distribution as a 
histogram — ^the height of the successive bars will indicate the 
several expected frequencies, each of which is the numerator tor a 
probability fraction Now, if we work out the expected frequencies 
for number of heads when 20 coins are tossed, and if in di awing 
the histogram we so scale the ordinate as to have the over-all 
hei^t about the same as that for the 10-coin situation and als i 
squeeze the base-lme scale (ranging from 0 to 20) into about the 
same over-all distance as for 10 coins, the vertical bars will be 
narrower, and the leaulting picture w'lll look more like a normal 
histogram than that obtained lor 10 coins If we repeat the process 
with n larger and larger, each time scaling our axes to about the 
same size as used for 10 corns and for 20 coins, the several bars of 
the histograms will become narrower and narrower, and with n 
sufficiently large the bars w'ill seem to merge and the contour of 
the graph will tend to appear indistinguishable from a normal 
curve 

The normal curve is for a continuous variable on the x avis, 
whereas the binomial distribution involves a discrete variable, or 
point series. For example, it is impossible to have any values 
between, say, 22 and 23 heads As n is taken larger and larger, 
and the total base line is kept fixed, the obtained values or possible 
points become more and more closely spaced so that the point 
senes approaches, or at least takes on the appearance of, con- 
tinuity. As n approaches infinity, the binomial distribution 
approaches the normal distribution as a limit. 

Approximation of probabilities. The foregoing suggests the 
possibility of using the normal curve as a basis for approximating 
the probabilities obtainable by the binomial expansion. In order 
to see how this might be done we shall consider the binomial dis- 
tribution for n s 16 for the coin tosidng situation, as shown in 
Table 5. Suppose we wish to ascertain the probability of getting 
at least 12 heads. This would be the sum of the separate probabili- 
ties of tossing 12, 13, 14, 15, and 16 heads These probabilities 
would be the respective “expected frequencies” eadi divided by 
65,536; hence the sum of the probabilities would be obtained by 
summing the numerators: 1, 16, 120, 560, and 1820, then dividing 
tins sum, 2517, by 65,536. Thus the probability of securing at 



Approximation of Probabilities 

Table 5 Binomial Distribution fob 16 Coins 


47 


Number of 

Expected 

Number of 

Expected 

Heads 

Frequencies 

Heads 

Frequencies 

16 

1 

7 

11,440 

15 

16 

6 

8,008 

14 

120 

5 

4,368 

13 

560 

4 

1,820 

12 

1,820 

3 

560 

11 

4,368 

2 

120 

10 

8,008 

1 

16 

9 

11,440 

0 

1 

8 

12,870 


65,536 


least 12 heads (12 or more) would be 2517/65,536, or a decimal 
equivalent of .03841 (to 5 places). 

Now let us attempt to find the same probability by using the 
normal curve approximation. First we note that for the distri- 
bution in T able 5 the mean will be np = 16( 5) and the c will be 
y/npq = \/16(.5)(.5) == 2. It will help us understand the method 


12.000 

10.000 

8.000 

6.000 

4.000 

2.000 



Fig 8 Normal curve fitted to binomial distribution 


of approximation if we superimpose on the histogram of the fre- 
quencies in Table 5 a normal curve having a mean of 8 and a a 
of 2 (see Fig 8) If we regard the area of each bar as representing 
an expected frequency, we see that the sum of the areas for the bars 
based on 12, 13, 14, 15, and 16 heads divided by the total area of 



48 Probability and Hypothesis Testing 

all the bars (= 65,536) will give the probability value of 03841 
lepOrted above To approximate this by the normal cinv-e we need 
to consider the aiea under the cuive for that part of the curve which 
spans the bars with base-line values of 12, 13, 14, 15, and 16 
Obviously we need the aiea under the cuiwe beyond an X value of 
11.5, a value which doesn’t make much sense in terms of number 
of heads but which does make sense when it is recalled that we aie 
here treating a point (discrete) variable as though it were a con- 
tinuous variable, noimally distributed Hence we have X — M = 
115 — 8 = 35 = 0 *, and a/o- = 3 5/2 = 1 75 Turning to Table 
A we find that the piopoi tionate aiea undei a noimal cur^^e bej^ond 
an x/a of 1.75 is 04006. This is our approximation to the exact 
probability value of 03841, the eiroi in this approximation is of 
the order of 002 In geneial, when n is fairly large the iailuie to 
shitt 5 (eg., from 12 to 115 as done here) leads to a negligible 
eiror. This shift of 5 is refen ed to as correction for continmty. 

We can, of course, use the normal curve to approximate any of 
the exact probabilities obtainable from Table 5 (or from the bi- 
nomial with n other than 16) For example, the exact probability 
of obtaining 10 or 11 or 12 heads is (8008 + 4368 + 1820)/65,536, 
or 21661 The normal curv^e approximation, calculated as the 
proportionate area under the curve from 9.5 to 12 5, is 21441 

It is fortunate for us that for n larger and larger the normal curve 
approximation becomes better and better since for n large the com- 
putation of exact probabilities by the binomial method becomes 
very arduous. 

Notice that, in approximating the probability, we have utilized 
an area under a cuive; i.e , we have said that the area between 
2 X values taken relative to a total area may be interpreted as a 
probability figure This is not inconsistent with our original 
definition of probability involving number (fietiuency) of events 
favorable relative to a total number of events (total frequency) 
Since, as previously indicated, the total area under a frequency 
curve for a continuous variable (or function) can be regarded as 
the total frequency, and the area for a particular segment c^n be 
regarded as the frequency with which values (or scores) fall in the 
given segment, it follows that the ratio of the segmental to the 
total frequency may be spoken of as a probability — ^the probability 
that a score falls between the 2 X values defining the segment. 
When we are dealing with a distribution of the normal type, the 



49 


Hypothesis Testing 

probability associated with a given segment is found by convert- 
ing the 2 X values, which define an mterval, into x/<r values 
and then determining the area from Table A The obtained pro- 
portionate aiea represents the probabihty expressed as a decimal 
fraction 

It should be obvious that, when we consider the unit normal 
curve, we can readily specify the proportionate area between any 
2 x/a values, say 2 i and 22 , and interpret the proportion as the 
probability of obtaining x/a values between the given zi and 22 
By reference to tables more extensive than Table A, it can be 
found that the area between an x/ a of — 1 96 and an x/ a of +1.96 
is very nearly 95, hence it would be said that 95 represents the 
probability of oblaming x/a values between these 2 points Fur- 
thermore, it can be said that 05 lepresents the probability that 
an x/<r, drawn at random from a normally distnbuted supply 
of (a:/o-)*s, will be numencally larger than 1 96 Similarly it can 
be said that the probability of drawing an x/tr between db2 576 is 
very nearly 99, while the probability for an x/tr falling outside 
these limits is 01 

The foregoing mterpretation of proportionate areas under the 
normal curve as probabilities is, m a sense, the basis for sometimes 
calling this cuive the normal 'probability curve. It has been noted 
that for p not equal to q, the point binomial leads to skewed proba^- 
bility distributions For contmuous functions it is also possible 
to have distributions, other than the normal, which permit proba- 
bility statements on the basis of proportionate areas Later we 
shall consider the use of 3 nonnormal probabihty distnbutions. 


HYPOTHESIS TESTING 

We may now return to a consideration of the blindfold test of the 
claimed abihty to distinguish between 2 cigarette brands By using 
the binomial expansion we can readily specify the probability of 
being collect (by chance) n times out of n tiials The answer is 
simply 1/2”, if there were 10 trials the probability of 10 correct 
choices (by chance guessing — ^no real discriminatory ability) would 
be 1/1024, or about .001, the probability of being correct 16 out 
oi 16 trials would be 1/65,536, or about 000015 If our self- 
proclaimed expert did succeed in 10 of 10 tnals we would, because 
of the small probability of 10 successes by chance, concede that he 



50 Probability and Hypothesis Testing 

really possessed the ability to discriminate between the 2 
braCids. 

But suppose he was successful on 9 trials of a lO-tnaJ senes? We 
could readily specify the probability of 9 successes by chance (it 
would be 10/1024) but for leasons which will become apparent 
later, it is better to asceitain the probability' of as many as 9 
successes in 10 trials (at least 9, or 9 or more, successes) This 
probability will be the probability of exactly 9 successes plus the 
probability of exactly 10 successes, or 10/1024 + 1/1024 = 
11/1024 = about .01, which is sufficiently small that w’e might 
decide that his performance was based on ability rather than on 
chance. Note that such a record would occur by chance about 1 
time m 100, so we couldn't be sure that he really had the ability. 

Next, let us suppose that he was correct on 8 of the 10 trials. 
The probability of at least 8 successes occurring on a chance basis 
would be 45/1024 + 10/1024 + 1/1024 * 56/1024 = about 05. 
Would we now conclude that he had the claimed ability? If we 
did so conclude we wouldn’t be as sure of our inference as when 
there were 9 successes, and far less sure than when there were 10 
successes. In other words, the smaller the probability of attainmg 
an obtamed number of successes by chance the surer we would be 
of our conclusion. If he were successful on 7 trials (probability = 
P = .17 for 7 or more successes) we would no doubt hesitate before 
conceding that his performance was based on ability to discrim- 
inate, since 7 successes can too ea^y occur on the basis of chance 
alone. 

We are thus led to the question What level of probability should 
one adopt as a criterion for deciding whether an observed per- 
formance is based on ability rather than chance? We are not yet 
ready to attempt an answer to this, but it might be remarked here 
that in choosing a level of probability it is necessary to consider the 
risk of being wrong in concluding that the fellow can discriminate 
vs. the risk of attributing his performance to chance when in 
reality he does have some ability. 

Whether a person can discriminate between 2 brands of cigarettes 
is a ample illustration of the problem of statistical inference, or the 
testing of hypotheses. For purpose of inference we set the hy- 
pothec that our fiiaid caxmot discriminate between brands. 
This readily permits us to calculate the probability (P) of as many 
successes by chance as he attains on a series of trials; if P is suffi- 



51 


Hypothesis Testing 

ciently small we reject the hypothesis of no ability, and in so domg 
we are saying that his number of successes is statistically signifidanty 
that is, nonchance The level of significance associated with rejec- 
tion of the hypothesis is represented by a probability — ^if we agree 
to reject the hypothesis only when the probability of chance success 
is as low as .01, we will have adopted the P == .01 level of signifi- 
cance If we aie willing to be less sure and require P to be as low 
as 05 we will be working at the .05 level of significance Whether 
we adopt the 01 or the 05 level is somewhat arbitrary — for this 
chapter let us quite arbitrarily choose P = .01 as our working level 
of significance After consideimg the more detailed discussion of 
this issue later in the chapter, the reader may prefer to adopt the 
.05 or some other level for judging sigmficance 

The binomial expansion (and normal curve approximation 
thereto) may be used in a wide variety of situations as a means of 
testing hypotheses A general requisite is that we be able to 
specify the probability of success (or something analogous to 
success) for a single element (com, die, tnaJ, etc ) In other words, 
we need to specify p (and q) so as to use (p + g)“ or we need to 
calculate the mean and SD in order to utilize the normal curve 
approximation when n is not small 

Consider the problem of public opinion polling. In polling 
studies we are usually interested in whether or not a population of 
potential voters is split 50-50 on an issue Accordingly we set the 
hypothesis that there is a 50-50 split in the population This 
hypothesis is to be accepted or rejected on the basis of information 
yielded by a sample of N persons, who are asked to respond ‘‘yes” 
(agree) or “no” (disagree) to a statement of the given issue Suppose 
for sake of simplicity we take AT = 64 and that 42 of them give a 
yes response Is this result consistent with the hypothesis of a 
50-50 split? 

To answer this we note that so far as the opinion poller is con- 
cerned there is, by hypothesis, a 50-50 chance that any individual 
in the sample will say yes (this despite the fact that the individual 
so far as he is concerned is not givmg a chance response) . Thus the 
probability of a yes response for a single individual is 1/2, that is, 
p = .5 and g = .5 (since q is always 1 — p). Now our sample of 
64 is analogous to a trial toss of 64 coins, so we consider the bi- 
nomial distnbuti on wi th n == N = 64 The mean = Np = 32, 
and the SD = \^Npq = 4 The number of yes responses, 42, 



50 Probability and Hypothesis Testing 

really possessed the ability to discnminate between the 2 
brafids 

But suppose he was successful on 9 trials of a 10-trial senes? We 
could readily specify the probability of 9 successes by chance (it 
would be 10/1024) but for reasons which \^t[ 11 become apparent 
later, it is better to ascertain the probability of as many as 9 
successes m 10 trials (at least 9, or 9 or more, successes). This 
probability will be the probability of exactly 9 successes plus the 
probability of exactly 10 successes, or 10/1024 -f 1/1024 = 
11/1024 = about 01, which is sufficiently small that we might 
decide that his performance was based on ability rather than on 
chance. Note that such a record would occur by chance about 1 
time in 100, so we couldn’t be sure that he really had the ability. 

Next, let us suppose that he was correct on 8 of the 10 trials. 
The probability of at least 8 successes occurring on a chance basis 
would be 45/1024 + 10/1024 -f 1/1024 = 56/1024 = about .05. 
Would we now conclude that he had the claimed ability^ If we 
did so conclude we wouldn’t be as sure of our inference as when 
there were 9 successes, and far less sure than when there were 10 
successes In other words, the smaller the probability of attaining 
an obtained number of successes by chance the surer we would be 
of our conclusion. If he were successful on 7 trials (probability = 
P = .17 for 7 or more successes) we would no doubt hesitate before 
conceding that his performance was based on ability to discrim- 
inate, since 7 successes can too easily occur on the basis of chance 
alone. 

We are thus led to the question What level of probability should 
one adopt as a criterion for deciding whether an observed per- 
formance IS based on ability rather than chance? We are not yet 
ready to attempt an answer to this, but it might be remarked here 
that in choosing a level of probability it is necessary to consider the 
risk of bemg wrong in concludmg that the fellow can discriminate 
vs the risk of attributing his performance to chance when in 
reality he does have some ability 

Whether a person can discriminate between 2 brands of cigarettes 
is a simple illustration of the problem of statistical inference, or the 
testing of hypotheses For purpose of inference we set the hy- 
pothesis that our fnend cannot discrimmate between brands. 
This readily permits us to calculate the probability (P) of as many 
successes by chance as he attains on a senes of trials; if P is suffi- 



51 


Hypothesis Testing 

ciently small we reject the hypothesis of no ability, and m so doing 
we are saymg that his number of successes is statishcdly signifiOmt, 
that is, nonchance The level of significance associated with rejec- 
tion of the hypothesis is represented by a probability — ^if we agree 
to reject the hypothesis only when the probability of chance success 
is as low as 01, we will have adopted the P = 01 level of signifi- 
cance If we are willing to be less sure and require P to be as low 
as 05 we will be working at the 05 level of significance Whether 
we adopt the 01 or the .05 level is somewhat arbitrary — ^for this 
chapter let us quite arbitrarily choose P = .01 as our working level 
of significance After considering the more detailed discussion of 
this issue later in the chapter, the reader may prefer to adopt the 
.05 or some other level for judging significance 

The binomial expansion (and normal curve approximation 
thereto) may be used in a wide vanety of situations as a means of 
testing hypotheses A general requisite is that we be able to 
specify the piobability of success (or something analogous to 
success) for a single element (com, die, tnal, etc ) In other words, 
we need to specify p (and g) so as to use (p + q)^ or we need to 
calculate the mean and SD in order to utilize the normal curve 
approximation when n is not small. 

Consider the problem of public opmion polling. In polling 
studies we are usually mterested m whether or not a population of 
potential voters is split 50-50 on an issue Accoidingly we set the 
hypothesis that there is a 50-50 split in the population This 
hypothesis is to be accepted or rejected on the basis of information 
yielded by a sample of N persons, who are asked to respond ‘‘yes” 
(agree) or “no” (disagree) to a statement of the given issue. Suppose 
for sake of simplicity we take iV = 64 and that 42 of them give a 
yes response Is this result consistent with the hypothesis of a 
50-50 split's^ 

To answer this we note that so far as the opinion poller is con- 
cerned there is, by hypothesis, a 50-50 chance that any individual 
m the sample will say yes (this despite the fact that the individual 
so far as he is concerned is not givmg a chance response) Thus the 
probability of a yes response for a single individual is 1/2, that is, 
p = .5 and g = 5 (since q is always 1 — p). Now our sample of 
64 IS analogous to a trial toss of 64 coins, so we consider the bi- 
nomial distribution with n = N — 64. The mean = Np = 32, 
and the SD = ^^Npq = 4. The number of yes responses, 42, 



52 


Probability and Hypothesis Testing 

deviates 10 from the mean. (Our normal curve approximation 
wofild be slightly better if we used 41 5 — 32 = 9 5 as our de- 
viate — correction for continuity.) Thus we have x/c = 10/4 = 
2 50. Turning to Table A we find that the probability of obtaining 
this large a deviation in an upward direction fiom 32 is about OOG, 
but in testing our hypothesis of 50-50 split we need also to include 
the probability of obtaining as large a deviation in the opposite 
direction; hence we double .006 and have P — 012 as the proba- 
bility of as large a deviation irrespective of direction Since this P 
IS very near our arbitranly (and temporarily) agreed upon F = .01 
level for judging significance, we reject the hypothesis of an equal 
split in the population being sampled, and this rejection implies 
that a majority of the population would endorse the given state- 
ment. 

In passing, it should be noted that had the number of yes re- 
sponses been, say, 35 we would accept the hypothesis of an equal 
split But this acceptance would not prove the hypothesis since 
35 could easily be a chance deviation from any of a number of 
splits, 55-45, 54-46, etc About this we will have more to say latei . 

Opinion poll results are usually expressed in percentage form, 
that is, as proportions multiplied by 100 Thus the hypothesis of 
a 50-50 split in the population implies 50 or 50 per cent yeses, and 
our result of 42 yeses for a sample of 64 leads to 656 or 65 6 per cent 
yeses. Accordingly it would appeal that in testing the significance 
of the deviation of 42 from 32 we are also testing the deviation of 
.656 from .50 (in proportion umts) or 65 6 from 50 (in percentage 
units). 

Actually, what we did above was to take 


X 42 — 32 
a ” V64( 5) (5) 


In converting to proportions we divided both numerator terms by 
64 (or 2V), and if we also divide the denominator by 64 (or N) we 
will not change the value of x/a- Thus we have 


X 

<r 


42/ 64 - 32/64 
64( 5)(5) 




( 64)2 


'.666 - 60 

0 ^ 


= 2 52 


which differs from 2 50 only because of roundinig errors. This 



53 


Hypothesis Testing 

implies that dividing by N somehow preserves the x/a nature of 
the result The numerator, or x, is a deviation, the deviation ofian 
observed sample proportion from a hypothetical proportion One 
might, therefore, deduce that the denominator is a ( 7 , but <r of what? 

Let f = number of yeses (frequency of yes es) ; f can vary from 
zero to N, with Mf = Np and ay = 'y/Npq If we divide every 
possible / by iV" we have proportions The mean in proportion 
units will be Mf/N = Np/N, oi simply p, and by a principle 
hinted at on p 26 t he stan dard dev iation m proportion units will 
be <rf/N = ^Npq/N = \/ {pq)/N, This last term is precisely 
what we have above as a denominator, hence as a a it is the stand- 
ard deviation of a distribution of proportions, we may symbolize 
this SD as Op 

In summary, we have np and ^/npq as the nj-Q^n and SD for a 
chance^ distribution of successes (on n coins or n dice or n trials, 
etc ) We have Np and as the mean and SD of a chance 

distiibution of num ber of yes responses for N individuals We 
have JO and x^pqJN as the mqan and fi^P^of a chance distribution 
of proportion of yeses based on samples of N individuals In the 
coin tossing and analogous situations each toss or trial leads to a 
countable number of successes, and the distribution of the numbei 
of successes for successive trials follows the binomial For the 
polling situation each sample of N cases leads to a calculable 
proportion of yeses, and the distribution of proportions for suc- 
cessive samples (of same size) also tends to follow the binomial 
Such a distnbution is referred to as the random (chance) sampling 
distribution of proportions 

It IS customary to refer to^eji. as the standard error oia proportion 
The term ‘‘error” is used here because, m effect, we are specifying 
the yariabihtyv due to chancp (saipplipg) error Actually, the 
sampling distribution of proportions is a theoretical distribution — 
we usually have just 1 sample proportion (or a few at most). 
Statistical theory provides us with information concerning the 
central value, vanability, and shape of the distribution to be 
expected if we did have a very ’arge number of sample proportions 

The scheme outlined above for testing hypotheses is not, of 
course, restricted to the cigarette blindfold test and the polling 
situation. In the first place the p for the binomial need not be 
1/2— our setup might involve a p of 1/3 (e g , identifying 1 of 3), 
nor are we confined to the hypothesis of 50-50 split when polling 



54 


Probability and Hypothesis Testing 

(e.g., we might be interested in whether there is a 2 to 1 split). 
In^the second place, we need not limit ourselves to number of 
successes or number of yeses. The fundamental requirement is 
that we be able to categorize observations (or individuals) into 2 
classes (a dichotomy) such as pass or fail, agree or disagree, like or 
dislike, present or absent, cured or not cured, etc 
When a hypothesis involving a proportion is tested, the general 
procedure is to express the observed proportion, po6, as a deviation 
from Pfe, the proportion expected on the basis of a statistical 
hypothesis, then to divide this deviation by 



This §iyes sometimes called a critical ratio (C-R), which for 

N not too small ^^nd jj^not too extreme will follow the unit normal 
curve, the table of whicK'permits us to ascertain the probability 
of a deviation as great as that observed Note that the proviso 
that Ph can’t be extreme follows from the fact that the binomial 
distnbution is skewed when p is extreme, say when p is greater than 
90 or less than .10 (see the formula for skewness on p. 45). Since 
the skewness is also a function of n it follows that any rule that we 
might adopt to prevent unjustifiable use of the normal curve 
approximation will be a function of N and p^. In general when 
both Nph and Nqh exceed 5 we can safely use the normal curve; if 
either puHiict isibetween 5 and 10 we should deduct .5/N from the 
numerical value of the deviation of p^^ from p^. This is another 
way of incorporatmg t ha correction for con tin uity (p. 48). 

Formula (18a) for a-p has been written with ph as*a value specified 
by the hypothesis to be tested As such the formula measures the 
chance vanation in proportions when the hypothesis is true. 
Actually, saying “if there is a 50-50 split in opinion” is the same as 
saying “if the proportion of yeses is .50 in the population.” If 
we let ppop stand for population proportion then the variation of 
sample proportions is given by substituting Ppop (and qpop) in (18a). 
When one has an obtainedjjr^ortion, and d oe.^’t know ppop 
(usuaUy the case) and Ms np^ypothe sis in miadt onTusesT^as an 
estimate of Ppop, and ^ 

( 186 ) 




Some Sampling Theory 55 

as an approximation of the standard error of an observed pro- 
portion 

At this point the student may be somewhat confused by the use 
of p, first as the probability of, say, success on a single element and 
then as a proportion Note, however, that if we were told that .30 
(a pioportion) of a given group have brown eyes, we could say that 
the probability that a randomly selected person has brown eyes is 
.30 Furthermore, when we say that the probability of rolling a 
snake eye is 1/6 or .1667, we mean that the proportion of snake 
eyes for a large number of rolls will tend to be 1667 

Some sampling theory. To facilitate later discussion we shall 
now introduce some notions of sampling theory We will confine 
our attention to what is known as simple random sampling. The 
conditions for random sampling are that each mdividual (person, 
plant, animal, observation, etc.) in a defined population (universe, 
or supply) shall have an equal chance of being included in the 
sample, and that the drawing of one individual shall m no way 
affect the drawmg of another (that is, the drawings must be in- 
dependent of each other). The first condition is not easily met in 
practice The aim is, of course, to obtam a sample which will be, 
within limits of random or chance errors, representative of the 
population from which it is drawn 

When dealing with attributes, or the classification of individuals 
into 2 (or more) categories, for which the proportion in a given 
category is a useful descriptive measure, we can conceive of a 
population proportion, Ppop, and a proportion, Pob, obtained on a 
random sample of N cases. Now if we could draw successive 
samples of N, determine pob for each sample, and then make a 
distribution of the several Poh values, we would expect this dis- 
tribution to follow the normal curve for N not small and p^op not 
extreme. This follows from our discussion of the binomial (hstn- 
bution and normal curve approximation thereto, the only difference 
being that we were than speaking of a chance distnbution about 
some hypothetical proportion, ph If Vh happened to equal Ppop 
we woifid be dealing with precisely the same distnbution of sample 
values. If, for example, the hypothesis of a 50-50 split is true we 
would expect the distribution of succes sive sam ple proportions to 
center at 5 and have an SD = ap = V PhQh/N = V(5)(.5)/Ar ; if 
the population proportion, Ppopj is 5 we would expect the suc- 
cessive sample proportions to have a mean of .5 and an SD = 
Up = VPpo^pop/N = V( 5)( 5)/iV’. 



56 


Probability and Hypothesis Testing 
DIFFERENCES BETWEEN PROPORTIONS 

a 

The testing of hypotheses need not be confined to a single 
proportion This is fortunate because in research involving attri- 
butes we are more apt to have 2 proportions, and since each is 
subject to chance (sampling) eiror, if follows that the difference 
between them will also be subject to chance error To test a 
hypothesis regarding the difference between 2 proportions it will be 
necessary that we have information concerning the theoretical ran- 
dom (chance) sampling distribution of the differences between 
proportions We will need to distinguish 2 diffeient types of 
situations (1) proportions based on 2 samples drawm independently 
from 2 populations and (2) proportions for i espouses or observa- 
tions obtained under 2 different conditions on just 1 sample For 
either situation we set up a statistical hypothesis known as a null 
hypothec This hypothesis, which states that there is no difference 
between the population proportions, mil be i ejected if the obtained 
difference reaches some prescnbed level of significance but wull be 
accepted otherwise Stated differently, if the observed difference 
could readily arise on a chance basis we accept the null hypothesis; 
if the probability of its occurrence by chance is small we reject the 
null hypothesis Note that our statistical hypothesis of no differ- 
ence may be, and often is, diametrically opposed to the research 
hypothesis bemg checked by the data That is, on the basis of 
theory or pnor observations we may expect a diffeience, yet for 
statistical reasons we set the null hypothesis If the obtained 
difference is statistically significant in the expected direction we 
regard the data as tending to support the research hypothesis 

Nonindependent proportions. We shall consider first the 
situation in w’^hich the 2 proportions being compared are not based 
on independent groups but on just 1 group (or on 2 related groups) 
Suppose w^e are interested m whether a movie leads to a change of 
opinion, i e , to an increase in the proportion favorable to some 
issue We select a random sample from some defined population, 
get a yes (favorable) or no (unfavorable) response from each 
individual, show them the movie, then again get a yes or no re- 
sponse from each Our next step is that of tabulation and, since we 
are concerned with possible changes in opmion, we will need to 
arrange our tabulation so as to show how many changed from no 
to yes, how many from yes to no, and how many “stood pat ” 
This can be done by placmg tally marks m a 2 by 2, or fourfold. 



Nonindependeut Proportions 57 

table such as that depicted m Table 6. For an individual who gave 
a yes response the first tune and a yes response the second time,^ 
tally would go in the upper right-hand cell; for a yes at first followed 
by a no, a tally would go in the upper left quadrant; and so on. 


Table 6 Tabulation Plan for Handling Proportions Based on 
THE Same Individuals 


Frequencies 

2nd 

No Yes 


Yes 

1st 

No 


A+C B+D N 


A 

B 

C 

D 


A 

C + D 


Proportions 

2nd 

No Yes 


Yes 


1st 

No 


a 


c 


Q2 


h 

d 

P2 


Pi 

qi 

ID 


Let A, B, Cj and D represent the respective frequencies for yes-no, 
yes-yes, no-no, and no-yes responses Then A + B is the total 
number of yeses at first and B + D is the total number of yeses 
the second time If each of these totals is divided by AT, we will 
have the proportions of yeses, pi and p 2 > respectively, for the first 
(or pre-) and the second (or post-) set of responses (Note the 
nght-hand part of Table 6 is obtained by dividmg the 8 frequencies 
in the left-hand part by AT ) 

Before proceeding to develop a scheme for testing the statistical 
sigmficance of the difference between the proportions, pi and p 2 r 
let us note that pi and P 2 can differ only in case the frequency A 
differs from the frequency B, since Pi = (A + B)/N and p 2 = 
(J5 + D)/N have B in common Our null hypothesis is that the 
movie produces no change, i e , that if the movie could be shown 
to the entire defined population, the proportion of yeses before and 
after would be exactly the same This does not mean than an 
individual can't change, but it does mean that the number of 
changes from yes to no balances off the number of changes from no 
to yes Thus we come to the proposition that on the basis of the 
null hypothesis we would expect those mdividuals who gave a 
changed response to split 50-50 as to direction of change Stated 
differently, we would expect 1/2 of the A + D individuals (the 
changers) to change fiom yes to no and 1/2 of them to change from 
no to yes 




58 


Probability and Hypothesis Testing 


Since this is precisely analogous to tossing A + D coins, we 
would expect successive samples to yield a chance distribution for 
the number of no to yes changes w hich would follow the binomial 
with M = {A + D)/2 and a = \/(A + I>)( 5)( 6) ; that is, with 
n = A -{■ D and p = 1/2 Note that, for + Z) fixed, the num- 
ber of yes to no changes is complementary to the number of no to 
yes changes just as when coins are tossed the number of tails is 
complementary to the number of heads — we need not count both. 
Thus a test of the significance of the deviation of either D or A 
from (A -1- D)/2 tells us whether B differs significantly from A 
For A D small, say 10 or less, we may use the actual binomial 
expansion to evaluate the change, but for A -1- Z) large we will need 
to resort to the normal curve approxunation. The latter is readily 
accomplished by expr essing Z) as a dev iation from (A + Z))/2 and 
dividmg by o-, or by \/(,A + Z))( 5)( 5), which gives a critical ratio, 


V(^+Z))(.5)(.5) 


.5Z) - .5A 


D-A 


.5 VA -1- D VA -M> 


(19a) 


as a value with which to enter Table A to find the probability of as 
large a deviation as that obtained. If this x/a, or CR, is 2 68 (or 
larger) the P =* .01 level of significance will have been reached (we 
are here dealing with the probability of as large a deviation ir- 
respective of direction). Ti^en A -f Z) is not large, say 11 to 20, 
our approximation will be appreciably improved by deducting .5 
from the deviation of D from (A -H i))/2; this is the correction for 
continuity again. 

If we widied to carry our computations throu^ on the basis of 
proportions we could express Z> as a proportion oi A + D (timilar 
to what we did on p. 53) but, as we diall see, it is more appropriate 
to introduce the sample size, N, into the picture Dividing both 
numerator and denominator of (19a) by N will not change the 
value of the fraction, that is, 


CB = - 


X D/N - A/N 




A + D 
JV* 


If we let a = A/N and d = B/N, this may be writtea as 



Nonindependent Proportions 59 

__ X d — a 
ct + d 
N 

This form for x/c will make more sense if we again consider 
Table 6, particularly the right-hand part. Note that since a + b — 
Pi and 6 + d = P 2 , it follows that d — a p 2 — Pi and accord- 
ingly a test of the significance of D as a deviation from (A -f- D)/2 
IS also a test of the significance of the difference between the 
proportions of yeses obtained on the 2 occasions 
The denominator of the right-hand side of (19b) must be a 
standard deviation. Of what? Actually it is the standard devia- 
tion of the theoretical samplmg distribution of differences between 
proportions, each difference being based on 1 sample of size N. 
Such a standard deviation, as we have noted previously, is referred 
to as a standard error Thus we have 

jcL 4“ d 

as the standard error of the difference between correlated pro- 
portions. The subscript r has been added to mdicate that this 
formula holds for related or correlated proportions. The. relation- 
ship, or correlation, concept needs a brief word of explanation. If, 
by chance sampling, pi were lower than the population value we 
would expect p 2 also to be somewhat low; if pi were by chance high 
we would expect P 2 to be somewhat high, if pi were near the popula- 
tion value (near average) we would expect p 2 to be near average 
This varying together is referred to as a co-relationship or cor- 
relation. Stated differently, we would not expect the 2 proportions 
to vary independently of each other for successive samples 
The proportions need not be based on the same individuals to be 
correlated. For example, if we were mterested in sex differences in 
opinion we might randomly choose families and then ascertain the 
proportion of yeses among the husbands and also among the wives; 
for successive samplings the 2 proportions might be correlated be- 
cause of a possible tendency for husbands and wives to agree on the 
given issue. As a second example, consider the setup mvolving the 
pairing of mdividuals for the purpose of having comparable ex- 
perimental and control groups. The fact of painng signifies that 





60 


Probability and Hypothesis Testing 

the 2 groups have not been drawn independently in the sampling 
sense, hence there might be a tendency for the proportions based on 
the 2 groups to be more or less alike (About pairing we will have 
more to say in the next chapter ) 

Another instance for which formulas (19a), (19&), and (20) are 
applicable is the problem of judging the significance of the differ- 
ence between proportions of yeses for 2 different questions asked 
of the same sample of N cases Since the lesponses to the 2 
questions might tend to vary together there could be a correlation 
between the propoitions on successive samplings 

In each of the foregoing situations we have pans of responses, 
and our tabulation must follow the scheme set forth in Table 6; 
i e , our tabulation will lead to the frequency of yes-no, yes-yes, 
no-no, and no-yes responses 

Formulas (19a), (19b), and (20) are usable in other situations 
When judging whether or not 2 test items differ significantly in 
difficulty we ordmanly have pass-fail data for both items on the 
same sample of N cases. Our tabulation leads to the frequencies 
for pass-fail, pass-pass, fail-fail, and fail-pass The kind of response 
IS irrelevant — ^it need only be such that a dichotomy is involved for 
each item or question 

These 3 formulas may be safely used for any size sample provided 
A + D is 10 or more If A -1- D is less than 10, the binomial ex- 
pansion provides an easily computed test of significance leading to 
an exact probability for as great a difference between the propor- 
tions as that observed. The P so obtained needs to be doubled to 
get the probability for as great a diffeience iirespective of direction; 
otherwise it is the probability for as large a difference in one direc- 
tion only About this we shall have more to say later under the 
heading, “One-tailed vs two-tailed tests,” p 62 

ndeDfiud^nt proportions. It is not easy to build up a formula 
for evaluating the difference between 2 proportions based on 2 
independent samples It can be shown mathematically that the 
standard error of the difference between proportions based on 
independent samples is given by 



in which Pc and qc are the proportions in the 2 categories for the 2 



61 


Which Hypothesis? 

groups combined The value of Pc is readily obtained by combining 
the 2 frequencies of yeses (or whatevei the given category is) and 
dividing hy Nc — Nx + N2, and as usual == 1 — Pc- An ob- 
seived difference divided by crjDp(t) will give an x/o-, or CRj inter- 
pretable as a unit normal curve deviate provided the N^s are not 
too small and Pc is not too extreme The rule-of-thumb is that pc 
or Qc (whichever is smaller) times Ni or N 2 (whichever is smaller) 
shall exceed 5 When this product is between 5 and 10 , a cor- 
rection for continuity should be incorporated This may be done 
by reducing the numerical (absolute) value of the difference, 

Pi - P2, by the quantity | ^ 

SOME GENERAL CONSIDERATIONS 

Before going further we should stop long enough to delineate the 
general problem of hypothesis testing, discuss the question of one- 
tailed vs two-tailed tests, and consider the problem of what level 
of significance to adopt 

Which hypothesis? In general, successive samplings will 
yield a sampling distribution of fiequencies or of proportions or of 
differences between statistical measures or certain ratios (such as 
x/<r or CR or other ratios, to be discussed later) Hypotheses, 
whether statistical or research, aie usually concerned either with 
differences or with deviations By research hypothesis we mean 
the hypothesis set up on the basis of theoiy 01 prior observation or 
on logical grounds Such a hypothesis usually mvolves a pre- 
diction regarding the outcome of an experiment. By statistical 
hypothesis we usually mean a null hypothesis set up for the purpose 
of evaluating the research hypothesis 

When we are considenng possible differences the null hypothesis, 
frequently symbolized as Hq, is pitted agamst an alternate hy- 
pothesis, Hi Now Hq specifies that, for example, Ppopo) ~ Ppop(2) 
or that 2 population values do not differ, whereas Hi might specify 
that Ppopii) ^ Ppop(,2) or that Ppopo.) Ppoptz) or that Ppopo.) ^ 
Ppopi.2) Which of these alternates is appropnate depends on the 
lesearch hypothesis to be tested by experiment or what question 
IS to be answered by experiment An experiment is earned out 
which yields sample values, pi and p2j and the difference between 
pi and P2 IS used to test Hq against Hi; that is, on the basis of the 



62 Probability and Hypothesis Testing 

obtained dijBfeience we are to make a decision as to whether Hq or 
is true 

Now if Ho IS true we can specify the probability of obtaining by 
chance a difference as great as pi — P 2 or as great as p 2 — Pi or as 
great as the numerical (irrespective of sign) diffeience, pi — P 2 
Let a represent a chosen level of significance — any level such as 
P = 10 or P = 05 or P = 01 or P = OOL We reject Hq, the 
null hypothesis, if the probability of the obtained result is as small 
as the chosen a, and this rejection implies the acceptance of Pi. 
If a is not reached we accept Hqj but this acceptance merely says 
that Ho could be true — any of a whole series of differences near 
zero could also be true This acceptance-rejection business in- 
volves risks, to be discussed below under “Choice of level of sig- 
nificance ” 

One-tailed vs. two-tailed tests. The 3 possible alternates 
listed above for Hi have to do with hypotheses admissible on the 
basis of either the research hypothesis or the question for which we 
seek an answer by way of an experiment. In general, if Pi states 
that Ppopo) does not equal Ppopiz)^ a two-tailed test is in order; if 
Pi specifies which population value is the larger, a one-tailed test 
is used The issue as to whether we should use a one-tailed test or a 
two-tailed test depends on whether the scientific hypothesis being 
tested (or at times the practical decision to be made) demands that 
we be concerned with chance deviations in just one direction or in 
both directions For situations in which we wonder whether a 
performance is better than chance, as in blindfold cigarette dis- 
crimination, we are concerned only with results in one direction, 
since any performance in which the subject is successful on less 
than .50 of the tnals leads us, without further statistical ado, to 
accept the hypothesis that he can’t discriminate better than chance 
Thus a one-tailed test is appropriate But for situations in which 
we wish to decide whether a population is split 50-50 on some 
question, we need to consider chance sampling deviations in both 
directions; hence we should use a two-tailed test 

Next consider the problem of testing the significance of the 
difference between 2 proportions. If, for example, we have the 
proportion of yeses to some question for a sample of Republicans 
and for a sample of Democrats as a basis for deciding whether 
Republicans and Democrats differ on the given issue, we would 
need to use a two-tailed test — ^we reject the hypothesis of no differ- 



One-tailed vs. Two-tailed Tests 


63 


ence in case the obtained difference, irrespective of direction, has a 
probability of occurrence which is as small as a, the chosen level 9i 
significance A one-tailed test would be utilized for judging 
significance in an experiment in which, for example, we were trying 
a new drug to see if it is better as a preventative than some com- 
monly used drug The decision to adopt the new drug is made only 
in case the new drug leads to a greater proportion of immunities — 
results in only one direction are crucial to the decision to change 
drugs But if we were trymg out 2 drugs with the idea of adopting 
the one which is most promising we would use a two-tailed test 
since significance m either direction is the basis for decision. 

It is sometimes argued that whenever one predicts on the basis 
of theory or previous observation the outcome of an experiment, a 
one-tailed test is appropriate smce some benefit should accrue to 
the researcher who has predicted the direction of the results as 
opposed to the mvestigator who, though obtaining similar results, 
has not predicted the outcome The benefit comes about in that 
the x/ff or CR for, say, the P = .01 level of significance need reach 
only 2 33 for a one-tailed as compared with 2 58 for a two-tailed 
test For the P = 05 level the respective values are 1 64 and 1.96. 
In other words a difference, to be significant, does not have to be as 
large for a one-tailed as for a two-tailed test. Smce the situation 
mvolving prediction is equivalent to takmg Hi as the hypothesis 
that the difference between 2 population values is in a specified 
direction, it is not only defensible to use a one-tailed test but 
actually better in the sense that if there is a real difference m the 
predicted direction it will be more apt to be detected by a one-tailed 
than by a two-tailed test However, a few words of caution are in 
order. 

First, the prediction should be made pnor to the collection of 
data, that is, independently of the data to be used in testing the 
hjrpothesis Second, one must be on guard agamst habit — ^in- 
stances can be cited where an investigator after making a series of 
one-tailed tests failed to shift to a two-tailed test when he should 
have Third, m case the results are significant m the direction 
opposite to the prediction one must, m effect, have a red face be- 
cause the outcome is not consistent with either of the admissible 
hypotheses no difference (as set forth by the null hypothesis) or a 
difference m the predicted direction (as set forth by the research 
hypothesis bemg tested). It is one thing to have results which 



64 


Probability and Hypothesis Testing 


simply fail to support a h 3 rpothesis, and quite a different thing to 
have an. outcome which is diametrically opposed to the hypothesis 

Choice of level of significance. How large should rr/a-, or 
CR, be before one claims significance*!^ Asked differently, How does 
one choose a, the value of P to be required for judging significance? 
There is no one answer to this question For a long time psy- 
chologists insisted on a CR of 3.00 (equivalent to P = .003 level 
for two-tailed test) as a rule-of-thumb value for judging signifi- 
cance There might be occasions when one would desire the 
assurance represented by a P of .003, but it should be noted that 
the acceptance of the null hypothesis whenever CR does not reach 
3.00 may lead too frequently to another type of erroneous con- 
clusion. To understand this, we must consider what it means 
when an observed difference does not lead to the rejection of the 
null hypothesis. Acceptance of the null hypothesis does not prove 
that no difference exists For example, a difference of 1 per cent, 
m number of yeses for 2 samples, which yields an x/a, or CR, of 
.8 does not prove that there is no difference in the 2 universe 
values — ^it merely indicates that the real difference could easily be 
zero. However, the obtamed difference of 1 could be a chance 
departure from a real difference of 5 or 1.2 or 1 8 or any of a whole 
senes of values near 1 In other words,. thiLSuU hypothesis is one 
which can be rejected but can never be proved, ThefSoreTo accept 
it too often because we insist on a high fevel of significance for 
rejection means that we are too apt to overlook real differences. 
This, plus the fact that we don't ordinarily need the assurance 
represented by a significance level of .003, would suggest that a CR 
of 3.00 IS too high 

At the other extreme, a few are willing to accept as significant a 
difference which is 1 5 times its standard error Since P = 13 
(two-tailed) for a CP of 1 5, it is readily seen that such persons 
would all too frequently have their publics believing that chance 
differences are real A less lax level, which has had general ac- 
ceptance by psychologists recently, is represented by a P of 05 
This also may be a rather low level of significance for announcing 
something as “fact ” Those writers who advocate the .05 level for 
research workers m psychology cite R A Fisher, an eminent 
statistician, as their authonty, but they fail to point out that 
Fisher's applications are to experimental situations in agnculture 
and. biology where there is far better control of sampling than is 
ordinarily the case in psychology. 



65 


Choice of Level of Significance 

If the findings of a study are to be used as the basis either for 
theory and further hypotheses or for social action, it does not s*m 
unreasonable to require a higher level of significance than the 05 
level The answer as to what level, in terms of probability, should 
be adopted in order to call a finding statistically sigmficant is not 
uninvolved Theie is the balancing of risks that of accepting the 
null hypothesis when to do so may mean the overlooking of a real 
difference against that of rejectmg the null hypothesis when doing 
so may lead to the acceptance of a chance difference as real. 
There is the question of the likelihood of independent venfication, 
and, finally, there is the whim of personal preference: some in- 
dividuals are more eager than others to announce a ^^significant'' 
findmg, others are more cautious in this regard It follows that 
no hard and fast rule can be given, one can interpret a given 
finding in terms of the probability of its occurrence by chance and 
then note whether the P is near the significance level adopted 
pnor to the experiment because it seemed appropriate when all 
factors were weighed 

The reader will have noted from the foregoing that the testing 
of hypotheses involves the possibility of 2 types of erroneous con- 
clusions These are usually referred to as type I and type II 
errors, which we shall now more specifically define Consider 
agam the null hypothe sis that no difference exists between 2 popu- 
lation values If we reject this hypothesis whe n in fact it is true , 
we will have committed a type I error If we accept the hypoth- 
esis when in fact it is false, we will have made a type II error 

The factors in choosing a level of significance might be further 
clarified by a somewhat different approach Notice that when we 
adopt P = a as our level of significance we are definitely specifying 
the probability of committing a type I error; it is s imply a By 
t aking a smaller and smaller we can reduce the nsk o f mak ing a 
type Terror! But what happens to the probability of making a 
type II^iTor as we thus reduce the nsk of a type I error? The 
answer, and the reasoning back of the answer, can readily be 
understood provided one is willmg to follow caiefully the follow- 
ing line of argument Suppose we have the proportions of im- 
mumty in 2 samples to which 2 drugs have been administered, 
and our question is whether drug A is supenor to drug B (a one- 
tailed test situation). Suppose further that the standard error of 
the difference between the 2 proportions is 02 The exposition 
will be somewhat simplified if we change to percentage units — 



66 Probability and Hypothesis Testing 

this is readily accomplished by shifting decimals for the propor- 
ti(5ns and also for the standard error; the latter becomes 2 in 
percentage units 

In Fig 9 will be found a series of sampling distribution curves, 




(6) a « 06, CR = 1.64 D must = 3 28 



Ftg 9 Type I and ts^pe II errors 


all with O' = 2, but with locations differing according to supposed 
true, or population, differences of 0, 4, and 8. The top part (a) is 
for a = .10, the middle (6) for a = 05, and the bottom (c) for 
€t “ .01. In each part an ordinate has been erected at the dif- 
ference required for significance at the given a level of significance. 
These required differences spnng from the fact that for a one- 
tailed test the x/(F values that cut off .10, .05, and 01 of a normal 
curve are 1 28, 1.64, and 2 33 respectively, and since a is 2, the 



67 


Choice of Level of Significance 

respective reqiured differences in percentages would be 2.56, 3.28, 
and 4 66. Sample differences faUmg beyond these values would 
be in what are termed critical regions for rejectmg the null hypoth- 
esis at the 3 respective a values For example, values beyond 
4 66 would be m the critical region when the P = 01 level of 
sigmficance is adopted 

From these several samplmg distribution curves and with the 
help of a table of the normal curve functions we can specify the 
probability ot committing a type II error for a specified (supposed) 
true difference If we keep in mind that the pro bability of a type 

1 error is ^(= 10, 05^ or .01), and that we caiTmake a type I 
error only when t he true difference is zero , we see that the pro^ 
portionate areas beyond 2 56, 3 28, and 4 66 for the 3 curves 
centenng at zero represent the probabilities of making a type I 
error for the respective a values For all sample values in the 
regions to the left of 2 56, 3 28, and 4 66 we would correctly accept 
the null hypothesis when m reality it is true The p robabilities 
for correct acceptanc e are given by 1 — a , or .90^ .9^and .99_ 
respectively. 

Let us now consider the supposition that the true difference is 

2 If 2 IS the true difference, any obtained difference falling in the 
region to the light ot 2 56, 3 28, and 4 66 will, for the respective 
levels ot significance, lead to the correct decision that a true dif- 
ference exists The probabilities for these correct inferences are 
obtained by expressing 2 56, 3 28, and 4 66 as deviations from 2 
(the supposed true value bemg considered), taking the deviation 
relative to the standard error of the difference (=2), and thus 
obtaining x/cr values of (2.56 — 2)/2 = 28, (3.28 — 2)/2 = 64, 
and (4.66 — 2)/2 = 1 33. Lookmg these values up in a table ot 
the normal curve we get probabilities, for correctly rejectmg the 
null hypothesis, of 39, .26, and .09, for the respective specified 
levels of significance, when the true difference is 2 percentage 
points. Probabilities for correctly rejectmg the null hypothesis are 
usually symbolized by Note that all sample values falling in 
the region to the left of 2 56, 3.28, and 4 66 (for the curves cen- 
tenng at 2) wiU lead to the false acceptance of the null hypothesis. 
The probabilities of making type II errors will correspond to the 
proportionate areas, for the curves centenng at 2, to the left of 
these 3 points (when we have the one-tailed test as considered 
here). These probabilities will, of course, be given to us by 1 — jS. 



68 Probability and H 3 TE>othesis Testing 

Thus we have 1 — 39 = 61, 1 — 26 = 74, and 1 — .09 = 91 
as ^he probabilities of making a type II error, when the true dif- 
ference is 2 and for the 10, .05, and 01 levels of significance. 
Note that taking a smallei and smaller increases the probability 
of making a type II erroi 

For a true difference of 4 we can by a similar line of reasoning 
obtain the probability of coirectly rejecting the null hypothesis 
and the probability of falsely accepting the null hypothesis, when 
using any one of the specified values of a These probabilities 
will involve the areas, under the curves centering at 4, to the 
nght of 2.56, 3 28, and 4 66 (for the jS's) and to the left of these 
same points (for the type II errors) The student can, as an 
exercise, leadily verify that the areas to the right of 2 56, 3.28, 
and 4 66 are approximately 76, 64, and 37 respectively. Sub- 
tracting each of these from unity wrill yield the probabilities, 24, 
36, and 63, of falsely accepting the null hypothesis or committing 
a type II error when the true diffeience is 4 and foi a!s of 10, 05, 
and .01 Again, the smallei we take a the larger the probability 
of making a type II error 

The probabilities given in the last 2 paragraphs, along with 
similar figures for other supposed true differences, have been 
assembled m Table 7 A careful study of this table reveals the 
general rule that the smaller the value of a the smaller the proba- 
bility (^J of correctly rejecting the null hypothesis and the larger 
the probability f l , j~ d) of co mmitting a type II error Thus 
when we reduce the probability of making a type I error by choos- 
ing a small, we do so at the risk of more often making a type II 
eiror Note also that regardless of a, the pr obability of making 
a t;^e II error decreases as the true differ^^s dev iate larther 
"andTiartner irom This is another way of saying that the 

lar ger the true difference the more apt we are to detect it by exp eri- 
^ment. and Conversely the ^sma ller the difference the less lik(^^ 
are to d iscover it — — — ’ — 

inci3^tairy,'^e value, of ^ for various possible true differences 
is referred to as the pother of the statistical test for detecting the 
difference If we plotted the jS’s m, say, the a == 05 column of 
Table 7 against the scale of possible differences we would have an 
ascending curve which would represent the power function of the 
^test. It is beyond the scope of this text to consider in detail the 
concepts having to do mth the power of a test. It should be 



69 


Choice of Level of Significance 

Table 7 Probability (jS) op Correctly Rejecting the Null Hypothesis 
AND Probability (1 — /3) op Type II Error Associated with 3 Levels op 
Significance (a^s op 10, 05, 01) When Certain True Differences Are 
Supposed to Exist 



/3 

1-/3 

a — > 

10 

05 

01 

10 

05 

01 

True 







dlff 







1 

22 

13 

03 

78 

87 

97 

2 

39 

26 

09 

61 

74 

91 

3 

59 

44 

20 

41 

56 

80 

4 

76 

64 

.37 

24 

36 

63 

5 

89 

79 

57 

11 

21 

43 

6 

96 

91 

75 

04 

09 

25 

7 

99 

97 

88 

01 

03 

12 

8 

997 

99 

95 

003 

01 

05 

9 

> 999 

997 

975 

< 001 

003 

025 

10 

> 999 

> 999 

996 

< 001 

< 001 

004 


remarked, however, that statistical teats differ m their pow er, and 
to understand this we would need to have more information re- 
garding various tests that might be used to test a given research 
hypothesis For instance, power depends upon the choice of the 
cntical region for rejecting the null hypothesis — ^for the first drug 
problem considered above, a one-tailed test is more powerful than 
a two-tailed test. In the next chapter we will be considering, 
among other things, differences between averages or central 
values, at which time it will be foimd that a test based on com- 
paring means will be more powerful than one based on medians 
Perhaps the discerning student will have noted that mcreasing 
sample size (or sizes) tends to reduce standard errors. In the 
above discussion we supposed that we had N^s and proportions 
such that the standard error of the difference {(td) was 2 percentage 
units. Quadrupling the iV's would reduce the to 1 percentage 
unit. How would this affect the results deduced from Fig 9 and 
set forth m Table 7^ Take, for example, a = 01 and suppose a 
true difference of 2 percentage points. With a-j) = 1, an obtamed 







70 Probability and Hypothesis Testing 

difference would have to fall in the region beyond 2 33 X 1 = 2 33 
to*be judged significant at the 01 level With a true difference 
of 2, the proportion of sample values falling beyond 2.33, calcu- 
lated by taking (233 - 2)/l = 33 = x/o-, is found to be 37. 
This is a jS value to be contrasted with a jS of 09 given in Table 7. 
We see, therefore, that quadrupling the sample N’s has increased 
fourfold the probability of detecting a diffeience of 2 points. Or 
stated differently, the probabihty of a type II error has been 
reduced from .91 to .63. The moral is plain one way of reducing 
the ndk of making a type II error, without increasing the risk of a 
type I error, is to increase N or N’s. Whether this is feasible will 
usually depend upon the resources available to the mvestigator 

Although contemporary mathematical statisticians usually con- 
sider hypothesis testing in terms of a definite reject-accept decision 
according to whether the chosen level of significance is or is not 
reached, there is another possibility. One might foUow the rule 
of rejecting the null hypothesis when P is less than .01 (say), 
accepting it when P is greater than 10, and reserving judgment 
when P is between 10 and .01. This, in effect, introduces a region 
of indecision, or calls for a postponement of decision until the 
experiment is repeated or more data are collected. Another possi- 
bihty, when a decision is not required for some practical reason, 
IS simply to report that a difference is significant at the 09 or the 
.04 or the .002 or whatever level is reached, and then let the 
reader evaluate the finding accordmg to his own preferred level 
of s^nificance (which he is apt to do anyway unless he is too 
naive). 

There are a couple of other points regarding significance. First, 
a statistically significant difference doesn’t necessarily mean a dif- 
ference either of practical significance or of scientific import 
Sometimes a “what of it’’ is not an impertinence Second, the 
habit of merely checkmg to see whether a result reaches a chosen 
level of significance should not lead one to overlook the possibility 
of clauniog, when appropnate, that a much higher level of signifi- 
cance was attained than the preresearch chosen level. 

SUMMARY 

In this chapter we have given a bnef account of the concept of 
probability and have sketched procedures for applying proba- 



71 


Summary 

bility notions in the testing of hypotheses involving frequencies 
and proportions (or percentages) We have noted the conditi<Jns 
for which it IS safe to use an rc/cr (or CR) and the normal curve to 
approximate probabilities If these conditions do not hold (when 
samples are small or proportions are extreme), one can obtain P 
exactly by way of the actual bmomial expansion for situations 
mvolvmg one proportion and for two correlated proportions. For 
proportions based on independent samples exact P's may be 
ascertained by another, and more complicated, method to be pre- 
sented later (p 240) 

The discussion of this chapter is only an introduction to the 
theory of statistical inference, or the use of probability m the 
testing of hypotheses We have, however, developed the general 
prmciples The extension of the theory to hypotheses involving 
continuous variables for relatively simple situations will be given 
m the next 2 chapters, with methods for more complex situations 
being postponed to later chapters (14, 15, 16, 17, 18). In Chapter 
13 we shall discuss more extensive procedures for handling hypoth- 
eses regarding frequencies and proportions. 



CHAPTER 6 


Inference: Continuous Variables 


As will be recalled, a frequency distribution for measurements 
on a continuous variable is describable with respect to central 
value, variability, skewness, and kurtosis, hence hypotheses in- 
volving continuous variables will be concerned with at least 1 of 
the descriptive measures of these 4 features of a frequency distri- 
bution. To test a given hypothesis we need information regarding 
the sampling behavior of the descriptive measure being used (or 
of some ratio containing the measure). 

In the previous chapter we were able to make certain easy 
deductions We saw, at the intuitive level, that the sampling 
distnbution of proportions and of differences between proportions 
tends, under specified conditions, to be normal in distribution 
with specifiable standard deviation, and that we could set up a 
deviation, x, such that the ratio x/<r tends to follow the unit normal 
curve. Unfortunately, the behavior of random sampling distri- 
butions of the measures that describe frequency distributions 
cannot so readily be determined Accordingly, we will need to 
lean on the deductions made by the mathematical statistician, 
who has the task of ascertammg mathematically the character- 
istics of random sampling distributions when certain conditions 
and assumptions hold We can learn how to use his results as a 
basis for testing hypotheses without necessarily understanding his 
mathematical derivations 

Since hypotheses mvolving means arise frequently in practice 
and since inferences based on means serve to illustrate further 
the general theory of statistical inference, we shall give consider- 
able detail on samplmg errors connected with means We shall 
present first an easily duplicated demonstration of the chance 

72 



Empirical Demonstration 73 

variation of means, and then a discussion of some theory and its 
use as a basis for hypothesis testing This chapter will be re- 
stricted to the large sample situation, with requisite sample size 
specified at appropriate times 

EMPIRICAL DEMONSTRATION 

The operation of chance sampling errors for means and stand- 
ard deviations can be illustrated by tossing, say, 7 coins 50 times 
and tabulating the number of heads per toss The obtamed fre- 
quencies will usually vary somewhat from those expected, which 
would be proportional to 1, 7, 21, 35, 35, 21, 7, 1 (as obtained by 
the binomial expansion) When the mean number of heads for 
50 tosses IS computed, it is not likely to be exactly 3 5 (np, the 
mean of the expected distribution), and the discrepancy from 3 5 
can be attributed to chance Likewise, 100 tosses will show de- 
partures from the expected frequencies, and consequently the 
mean based on 100 tosses will differ more or less from 3 5 Further- 
moie, and for the same reason, the standard deviation of the ob- 
tained distnbution of heads Avill likely differ from 1.323 {\/npq, 
the SD of the expected frequencies) The student, as an exercise, 
can demonstrate the foregoing statements by actually tossing 
coins Indeed it will be quite instructive if each class member 
tosses 7 corns 50 times, each time tallying the number of heads 
that turn up. This will lead to a frequency distribution running 
(possibly) from 0 to 7 heads, with an iV of 50 Then a second 
senes of 50 tosses should be made, thus providmg a second distri- 
bution The 2 frequency distributions can be combined, so each 
student will have 3 distributions, 2 with iV’s of 50 and 1 with an 
N of 100 Note that chance is so operating as to produce a distri- 
bution somewhat similar to the expected, but at the same time 
is operating in such a manner as to lead to discrepancies between 
observed and expected frequencies 
Each student should compute the means and the standard devia- 
tions for each of the 3 distributions Note how far these values 
depart from the expected mean of 3 5 and the expected standard 
deviation of 1.323 Then the several means and standard devia- 
tions secured by the class members should be brought together. 
In order better to understand what happens when each of several 
persons tosses 7 coins 50 times, i e , takes a sample of 50 tosses, a 



74 


Inference: Continuous Variables 


frequency distribution of the means, also of the SD% based on 
50 tosses should be made. Likewise a separate distribution should 
be made for the M's based on 100 tosses, also, the SD^s A study 
of these distributions should provide answers to such questions as: 
Their central tendencies are near what values'? What is the ex- 
tent of dispersion for these distributions of M's and <t's? Is there 
any difference in the dispersion for the distribution of means based 
on 50 tosses and that based on 100 tosses? How would you ac- 
count for this difference? In general, what is the shape of these 
distributions of M's and o-'s? 

In Table 8 will be found the distnbutions of the means obtained 

Table 8 Distributions op 600 Means Based on 50 Tosses and 300 Means 
Based on 100 Tosses of 7 Coins 



50 Tosses 

100 Tosses 

4 00-4 09 

3 


3 90--3 99 

14 


3 80-3 89 

35 

4 

3 70-3 79 

50 

23 

3 60-3 69 

98 

58 

3 50-5 59 

119 

78 

3 40-3 49 

120 

85 

3 30-3 39 

85 

32 

3 20-5 29 

52 

17 

3 10-3 19 

21 

3 

3 00-3 09 

2 


2 90-2 99 

1 



— 

— 

Number of means 

600 

300 

Mean of means 

SD * distribution 

3 516 

3 513 

of means 

.190 

135 

Expected SD 

.187 

132 


• Corrected for grouping. 

by several of the author’s classes. Thou^ ttiese are not models 
for number of intervals, they are neverthdess sufficient as a bapip 
for answering the for^omg questions. Note that both distnbu- 
tions appear to be normal, that both center very near the mean 
of the theoretical distribution (3.5), and that the variability for 
means based on 100 tosses is less than that baaed on 60 tosses. 
It would thus seem that means based on 100 tosses are somewhat 
more stable or less variable than those based on 50 tosses. Does 



75 


More Sampling Theory 

this suggest that a larger number of tosses, i e , a laiger sample, 
would tend to iron out the chance factors that operate to produce 
discrepancies between the observed distribution of number ot 
heads and the expected distribution calculated by the binomial 
expansion? Do you think that means based on 500 tosses would 
show less dispersion than means based on 100 tosses? 

Accordmg to the mathematical statisticians, the standard devia- 
tion of the distribution of means is expected to be equal to 1 323 
(expected or of the distribution of number of heads) divided by the 
square root of the sample size Note at the bottom of Table 8 
that the SD^s of the distributions of means, 190 and 135, are 
very near the expected values of 187 and 132 obtained from 
1 323/\/50 and 1 323/\/l00, respectively 
Summaiizing the results of the above empincal work, we see 
that the means for successive samples tend to distribute them- 
selves normally about the expected or universe mean with a 
spread or standard deviation which is very near the value pre- 
dicted by mathematical theory The student should keep these 
empirical distnbutions and deductions therefrom in mind as we 
now proceed to a more detailed consideration of what the mathe- 
matical statistician says will happen when successive samples of 
a given size are drawn from a defined universe or population or 
supply 

MOB£ SAMPLING THEORY 

The discussion here holds for what is known as simple randrnn 
sampling As specified in the previous chapter, the conditions for 
simple random sampling are that the sample should be drawn m 
such a way that each individual (person, plant, animal, etc.) in 
the defined imiverse shall have an equal chance of being included 
in the sample, and that the drawmg of one mdividual shall in no 
way affect the drawmg of another. The aim is, of course, to ob- 
tain a sample which will, within limits of random or chance errors, 
be representative of the umverse from which it was drawn 
Let 


N = the number of cases, or size of sample 
M = the mean of any sample (known, i e , computed), 
or = the SD of any sample (known, i e., computed). 
Mpop = the mean of the defined population (unknown) 
(Tpop = the SD of the defined population (unknown). 



76 


Inference: Continuous Variables 


The Mpop and are for the distribution of scores or measure- 
ment for all the individuals in the defined universe. It is not 
assumed that this univeise distribution is exactly normal, it may 
be skewed slightly. Strictly speaking, the number, Npop, of cases 
in the universe should be infinitely large, but failure to meet this 
requirement is not serious. As will be seen later, the adjustment 
necessary when a sample of N cases is diawn from a limited (finite) 
universe of Npop cases is of the order of N/Npop, if it is known that 
Npop is very large relative to iV, the formulations about to be pre- 
sented will be sufficiently accurate for all practical purposes 
Now suppose we draw a sample of N cases, compute the mean 
and standard deviation, then draw another sample ot the same 
size and compute its mean and standard deviation, and so on 
until a large number of samples, say 10,000, have been drawn. 
We will then have 10,000 means and 10,000 standard deviations, 
each based on N cases When we make a distnbution of the 
10,000 means and of the 10,000 standard deviations, we have 
random sampling distnbutions. From the viewpoint of mathe- 
matical rigor, the number of successive samples should be much 
larger than 10,000, certainly far larger than the 600, or 300, suc- 
cessive samples of Table 8, m which we have only the beginning 
of 2 random sampling distributions 
By rather complex mathematical methods it can be shown that, 
if successive samples of constant size, AT, are drawn randomly 
from a normally distnbuted umverse or population with mean 
equal to Mpop and SD equal to apop, the successive sample means 
will be normally distributed about Mpopj and the standard devia- 
tion of this sampling distribution will be ffpop/^/N. The random 
sampling distribution of the successive standard deviations will 
center at <rpop (there is a small bias here which need not concern 
us at this time) For AT large (100 or more) this distnbution of 
<r’s will be approximately normal with standard deviation equal 
to (Tpop/'s/^ These mathematical findings have often been 
checked empirically. Our Table 8 provides a limited check on 
the samplmg theory regardmg the mean 
We are now m position to consider a term used in the previous 
chapter. In general, the standa rdj^or of a statistical measure is 
the standard (ieviation of the sampling distnbution for the given 
measure The square of the standard error is called the ^ay^ ling 
variance. For the practical statistician, the sampling distribution 



77 


More Sampling Theory 

is hypothetical, and hence its standard deviation must be deter- 
mined by a different formula from that used for computation 
from an actual distribution The value given by apop/y/N is 
called the standard error of the mean and may be designated as 
t<TM, the subindex t is used to indicate “true’’ value (not estimated) 
Each sample mean can be expressed in standard form (analogous 
to standard scores) as (M — Mpop)/tcrMf and these relative deviates 
will form a noimal distribution with mean of zero and standard 
deviation of unity By reterence to Table A we can readily specify 
the chances of obtaining a sample mean yielding a deviation as 
great as that for a given M, provided the value of Mpop is known 
But in practical work Mpop is the unknown about which we desire 
to make an inference on the basis of just 1 sample 

Before resolving this practical problem, we must call attention 
to the fact that the uni\erse standard deviation, <rpop, needed to 
obtain tffM IS also an unknowm A single sample will yield a stand- 
ard deviation, <r, which, being a sample value, will of course devi- 
ate more or less from j-pop- In order that an inference about Mpop 
may be made from a single sample, t<^Ai is estimated by using 
(rf\/Nj 1 e , the unknown apop is leplaced by the sample cr as an 
estimate Instead of the true value for the standard eiror of the 
mean as given by (Tpoply/N, we have an approximate value, 
tr/y/N. Let <TMi defined as <r/\/iV> stand for the appioximate 
standard error. 

The Ignorance concerning Cpop, and the consequent approximate 
value for the standard error of a given mean, lead to a reconsidera- 
tion of the sampling distnbution of means expressed as relative 
deviates As already pointed out, the means from successive 
samples will be distributed normally, and the lelative deviates, 
{M — Mpo'^/tmi will likewise be distnbuted normally since 
= <Tpop/y/N IS a constant When (as is nearly always the 
case) we have <t instead of crpop and wish to make an inference 
about a universe mean, we need to know something of the sam- 
pling behavior of successive sample means expressed as relative 
deviates from Mpop where cm is not a constant but varies from 
sample to sample because the several sample standard deviations 
vary Thus the relative deviate of the first sample mean will be 
(Ml — Mpop) divided by ci/\/N, for the second sample, 
(M2 — Mpop) divided by c2/y/N, and so on The distnbution 
of these relative deviates will not approximate normality unless 



78 


Inference: Continuous Variables 


iV is fairly large Thus the use of an estunate of Cpop in deter- 
DMniag ffitf imposes the restnction that N dtiall not be too small. 
If N is not less than 30, we can safely use the normal curve as the 
basis for drawing an inference or testmg a hypothesis regarding 
Mpop This chapter's discussion ot sampling is therefore not 
applicable unless N is greater than 30 The refinements necessary 
for J\r's less than 30 will be given in the next chapter 


HYPOTHESES REGARDING A SINGLE MEASURE 

Whether the foregomg theory is used as a basis for mal-in g an 
inference about a population value or for testing some hypotheas 
depends on the practical problem faced by the investigator. We 
•shall now consider hypothesis testing, and later we shall discuss 
a type of inference which is useful both when we do and do not 
have a research hypothesis in mmd. 

Single mean. The procedure for testmg a hypothesis about a 
population mean on the basis of a sample mean (and BD) for N 
cases is very snnilax to that for testing a hypothesis when we have 
a sample propoition (discussed earlier, pp. 52-55) We let 
Mk stmd for a hypothesized value of Our sample mean, 

M, taken as a deviation from Mn, is expressed in the form of an 
x/n, aat IS, as (Af - The theory tehs us that if Mh is 

j ® (i e., corresponds to Afpop), successive sample Af’s will be 
distributed normally about Af* \vith standard deviation = v- 
(approxi^tely) In testing the given hypothesis we are merely 
raising the question as to whether it is reasonable to believe that 
our observed ^ple mean, M, belongs to a sampling distribution 
cente]^ differently, does Af deviate significantly 

from MhJ To answer this we need to know the probability of L 
lai^ a delation on the basis of chance sampling errors, and to 
get this probability we need only ©iter Table A with (Af - Mk) 

^ an x/ff (or CR) If we have decided to adopt the P = .01 level 

hypothesis when 

^ X , J 2 68 (for a two-tailed test) or 2 33 (for a 

one-tailed test) , otherwise we accept the hypothesis 

Actu^y, to are relatively few occasions in psychological 
^arch for ^^ch either scientific theory or prior otoation 
provides us with a hypothesis concerning the mean for a popular 

to 



79 


Significance of Mean C3iange 

As an example of a situation for which the testing of a hypoth- 
esis about a mean is appropriate we cite the IQ tests. For reasons 
which we shall not discuss here, a properly constructed test should 
yield 100 as the average IQ for the population of children for any 
given age level Consider Form L of the 1937 Revision 
of the Stanford-Binet Scale For age 7 a sample of 202 gives 
a mean of 101 78 a nd a <r of 16 18 The value of cm be- 
comes 16 18/\/202 = 1.14 From these figures we have 
{M — = (101 78 — 100)/1 14 = 1.56 as an r/o-. Turn- 

ing to Table A we find that the F for as large a deviation (irrespec- 
tive of direction — a. two-tailed test is needed here) from 100 is 
.12 Since this probability is not as small as our arbitrarily chosen 
F = 01 level of sigmficance, we accept the hypothesis that the 
1937 Stanford-Binet meets the requirement of yielding an average 
of 100 at age 7. That the scale is not entirely satisfactory in this 
regard is evident when we consider the M of 104 28 and c of 16.42 
for a sample of 204 nine year olds We have cm = 1 15, which 
leads to an x/c of (104.28 — 100)/1 15 = 3 72 Since the prob- 
ability of as large a deviation is about .0002, we reject the hypoth- 
esis that the scale would yield a population mean of 100 at age 9. 

Significance of mean change. A frequently encountered 
problem is that of evaluating changes in order to say whether 
some provided experience or change in conditions leads to a shift 
in performance. 

Let 

Xi = score prior to experience (or under one condition). 

X 2 = score after the experience (or under second condition). 

D = X 2 — = change score. 

Or we might take D = JTi — Z 2 if losses instead of gains are 
of interest, but regardless of which way we define the D score, 
the subtraction is made in the same direction for all N cases and 
negative signs are kept A sample of N mdividuals will give us 
N changes, or N D’s We can either make or conceive of a distri- 
bution of ihe D's This distribution will have a mean, Md, and a 
standard deviation, cd, whence we can get the standard error of 
the mean difference: cmd = cdI^/N, In other words, a mean 
change is treated just like any other mean. Regardless of one^s 
hunch or prediction about the effect of the experience (or the 
effect of the change in conditions), one sets the null hypothesis 
that there is no effect. This is equivalent to saymg that, if we 



80 


Inference: Continuous Variables 


had Xi and X 2 scores on the defined population, the value of Mb 
wouid be zero If this hypothesis is true and if we were to take 
successive samples of size N we would expect that the sample 
means would be distributed normally about zero with SD = (^md- 
To test the null hypothesis we simply take our obtained Mb as a 
deviation from the null value of zero and divide by (tmd That 
is, {Mb - 0)/<rMD = MbUmd- This x/a is then used as an entry 
into Table A m order to specify the probability of as large a mean 
difference as our sample Mb arising solely on the basis of chance 
sampling. Whether we reject or accept the hypothesis of no 
effect depends on whether P does or does not reach the chosen 
level of significance We should use a one-tailed test here if the 
lesearch hypothesis predicted the direction of the change, but if 
we had no a pnori hypothesis as to the direction of change we 
would need to use the two-tailed test 
A word should be inserted about the required computations 
since there is some danger of confusion when one is confronted 
with the calculation of M and a for scores (changes) which are 
both positive and negative, and sometimes zero The gross score 
formula for the mean (2) and that for the standard deviation (66) 
are applicable provided one takes SD (equivalent to SX) as the 
algebraic sum The equivalent of SX^, that is, SD^, raises no 
problem since the squanng process automatically eliminates nega- 
tive signs Theie are two reasons why one should make a fre- 
quency distribution of the D's. First, the theory assumes that 
the D^s approximate a normal distnbution, if a distribution is 
made one has at least a rough check on this assumption (there 
are statistical methods for checking this assumption; see p 82 
and also p. 236). Second, if N is sizable, computation from a fre- 
quency distribution is more economical of time than use of the 
gross score formulas In laying out the intervals, one must pro- 
vide a place for tabulatmg zero D's This can conveniently be 
accomplished by the following illustrative scheme which includes 
only the 4 intervals near zero 2-3, 0-1, -1-2, -3-4 (for i = 2); 
3-5, 0-2, —1—3, —4-6 (for t = 3), 4-7, 0-3, -1-4, -5-8 (for 
i = 4); etc Note that the last given intervals m each set are 
for negative D’s. AO taken as the midpomt of the bottom mter- 
val will be a negative number, and must be treated as such when 
entered into formula (3) 

Other single measures. The general theory of statistical 
inference may be extended to testing hypotheses concerning any 



81 


Other Single Measures 

descriptive measure provided information is available (from the 
mathematical statistician) concerning the characteristics of the 
random sampling distribution of the measure. When the sam- 
pling distribution is normal in form with known or estimable 
variability, we may proceed to test hypotheses by settmg up an 
ar/o* (or CR). For this purpose we need formulas for the standard 
errors of different measures The formulas about to be presented 
are based on the assumption that the score distribution is normal 
or approximately so. 

As previously noted, for N greater than 30 we may safely use 

ffAf = (22) 

as the standard error of the mean. For N greater than LOO it is 
safe to take 

1253<r 

^mdn 

as the standard error of the median A comparison of the stand- 
ard error of the mean with that of the median mdicates that the 
mean fluctuates less than the median; i e , the mean is a more 
stable measure of central value than the median In order to 
reduce the standard error of the median to the same magnitude as 
that of the mean it is necessary to take 57 per cent more cases, 
i e., increase N by 57 per cent It follows from this that the use 
of the median for distnbutions which are reasonably normal in 
form is equivalent to throwing away a large proportion of the 
cases 

The sampling errors involved in measures of dispersion are 
(T .707a- 

.756(ilD) 

^ 

1 166 « 3 ) 

Vn 

From these error formulas it will be seen that, conddering the 
error relative to the magmtude of the measures of dispersion, o- 


Vn 



82 


Inference: Continuous Variables 


IS the most stable measure of variation Provided N is 100 or 
morep. the samplmg distributions for these measures of dispersion 
are such that their standard errors can be utilized in exactly the 
same way as the standard error of the mean. 

The standard errors for measures of skewness and kurtosis, as 
defined on p. 28, are 





(24a) 



These 2 formulas are based on the assumption that the sample 
has been drawn from a normally distnbuted population, and there- 
fore they can be legitimately used m testmg the assumption of 
normality. It will be recalled that, for normal distributions, both 
Qi and are equal to zero, but for a sample they may not be zero; 
however, sample values should not show a greater deviation from 
zero than can be reasonably attributed to chance. If a sample 
yields a gi value which is more than, say, 2.58 times its samplmg 
error, one would suspect that the sample was not drawn from a 
symmetrically distnbuted supply. Likewise, if g 2 deviates more 
than 2 58 tunes its standard error, one would question whether 
it is reasonable to believe that the population or supply is dis- 
tributed with normal kurtosis. A two-tailed test is appropriate 
here, and consequently choosing 2.58 is equivalent to adopting 
the .01 level of significance. 


HYPOTHESES ABOUT DIFFERENCES 

One of the foremost problems in practical statistics is the com- 
parison of group trends. We may wonder whether one college 
group is superior to another, whether practice on a task improves 
peif ormance, whether rats learn more rapidly when food or when 
water is the incentive, whether reaction time is faster to sound 
than to light, whether the sexes show a difference in variational 
tendency, whether one learning method is better than another, 
etc. In order to answer questions like the above, it is necessary 
to make observations on samples from 2 groups or on the same 
group under 2 different experimental conditions, and then to com- 



Difference between Correlated Means 83 

pute appropriate statistical measures for the variable upon which 
we wish to make the comparison 
Thus, typically, we have 2 samples of Ni and N 2 cases or 2 sets 
of scores on just N cases under 2 different conditions, with means 
Ml and M 2 and standard deviations <ri and 0 - 2 , where the sub- 
scripts refer to the 2 sets of scores. As we have learned, each 
mean is subject to sampling fluctuations; therefore the difference 
between the means will also be subject to sampling fluctuations 
Even though Mpopa) — Mpopm there may be a difference between 
sample means because of chance samplmg errors To test an ob- 
tained difference for significance we wdl need a measure of the 
samplmg error of differences, i e., the standard error of the dif- 
ference between two means. Knowing this standard error we can 
set up the null hypothesis that there is no difference between the 
two population means and then reject or accept this hypothesis 
according to whether the obtained difference does or does not 
reach an appropriate level of significance. 

Here, as in the case of the difference between proportions, we 
must distinguish between the situation where our 2 means are 
based on independent as opposed to nonindependent (correlated) 
scores. 

Difference between correlated means. Let us again consider 
the method outlmed above for testmg the significance of a mean 
change As implied there, the Xi and X 2 scores could stand for 
performance for N individuals under 2 different conditions A 
httle simple algebra at this pomt will lead to some interesting 
results As before, we let 

D = X 2 — 

By definition the mean of the distribution of these AT difference 
scores will be 

SD S(X2 - Xi) 

Mb 

N N 

hence 

Mb — M2 — M\ = Dm 

by which we see that the mean of the difference is equal to the dif~ 
erence between the means This will, of course, be true for every 



84 


Inference: Continuous Variables 


sample. It follows therefore that when we test the significance of 
Mj) as a deviation from zero we aie also testing the significance of 
Dm as a deviation from zero In other words, we are testing the 
significance of the difference between 2 means based on the same 
N cases 

When testing Md we calculated ajo, thence (tmd- Let us con- 
sider a bit fuither the standard deviation of the distiibution of 
differences, We first express the D’s as deviations from their 
own mean, i e , d ^ D — Md Since D — — Xi and Md = 

M2 — ilfi, we have 

d - (X2 - Xi) - {M2 - Ml) 

which, when the parentheses are removed and the terms shifted, 
becomes 

d = X2 — M2 — -STi “f* Ml 
or 

d = (X2 ~ M2) - (Xi - Ml) 

Both these new parentheses terms define deviation units of the 
type a; = X — M, so that d = 0:2 — aji The standard deviation 
squared, or variance, of the difference can be expressed by sub- 
stituting d for X m formula ( 4 ) , thus 

, Sd^ 

— "TT 


If we replace d by its equivalent, we have 

2 2 (x 2 -xi)^ 2x^2 2 

(T D ^ 1 


2x^1 
^ i 

N N 


22x2X1 

~~N 


The first 2 of the 3 terms on the nght are obviously the variances 
for the second and first sets of scores. The last teim, involving 
the sum of the cross products of X2 and the xi with which it is 
paired, has to do with the degree of correlation between, or simi- 
larity of, the scores that belong to the same individual The 
reader is asked to take on faith, without further explanation heie, 
the fact that the last term becomes 2ri20’i^2j m which r is a meas- 
ure of correlation Hence we can write 

= 0^2 + 2ri2(7iU2 


2 + 1 “ 2ri2cri<r2 



Difference between Correlated Means 


85 


Since the standard error of any mean is given by dividing the 
standard deviation by the square root of N, we secure the standard 
eiror of the mean difference by dividmg <rD by ^/N, i e., 




on ^ 0 ^ 1 4“ <3^2 — 2ri20'iff2 

Vn ^ 



2i*120‘iO'2 

~ N~ 


The first 2 terms under the last radical are the samplmg variances 
of the 2 means, and since 2 ri 2 (Ti€r 2 /N can be written as 


we have finally that 


2ri2 


<^10'2 

VnVn 


^ Ml + Ml — 

Since each Mb = Dm, it follows that cmd - ^^^at the 

standard error of the mean difference is equal to the standard error of 
the difference between the 2 means Thus we have 2 ways for eval- 
uating a difference between nonindependent means We can 
compute Mb, on, thence 

on . 

(25a) 

oi we can compute Mi, M 2 , (ti, 0 - 2 , and ri 2 , and then obtain 

<^Md " ^ ^ Ml + Ml — 2ri20'jifi0-jif2 = cbm (256) 

Foimula (256) is usually referied to as the standard error of the 
difference between correlated means, hence the symbol <tbm 

But by working with the difference between paired scores, we 
can obtain the standard error of the mean difference (= difference 
between means) without computing r Even after we have learned 
how to compute r, it matteis not whether we compute the stand- 
ard error of the difference between means of related scores by 
formula (256) or whether we compute its eqmvalent, the standard 
error of the mean of the differences 



86 


Inference: Continuous Variables 


Strictly spealdng, the in (256) should be written as tmiMi so 
as to indicate that it is a measure of the extent to which successive 
pairs of means vary together, but it can be shown that the correla- 
tion between means is the same as ri 2 , the correlation between the 
scores entering into the means 

Since Md ^ Dm and cmd = should be obvious that when 

testing the null hypothesis we have 


Md 

xja (or CR) = 

<^Md 


Dm 


That is, the procedure for testing the null hypothesis that Mn is 
zero for a population is equivalent to testing the null hypothesis 
that Mpopa) = Afpop( 2 ) where the subscripts 1 and 2 indicate that 
we are considering 2 populations of scoreSj 1 for each condition. 

Formulas (25a) and (256) are appropriate in a number of situa- 
tions in which an Xi score is somehow paired with an X 2 score. 
Some of the possibilities are the following. 

a Xi as first trial — ^practice — X 2 as later trial; same person. 

6. Xi as initial — experience — ^^2 as final; same person. 

c. Xi as pretest — experience — ^Z 2 as posttest; same person. 

d Xi under experimental conditions vs. X 2 under normal (or 
control) , same person 

c Xi in one experimental condition vs. X 2 in another; same 
person 

/. Xi as expenmental vs X 2 as control, twin or litter pair 
* g. Xi as experimental vs X 2 as control; unrelated persons, but 
matched by pairing on pertinent variables. Ditto, for 
2 experimental conditions 

For the last mentioned situation (^), which is commonly em- 
ployed in experimental work, one can think of having drawn N 
mdividuals at random for one group, then forming the second 
group by selecting mdividuals who can be paired with the members 
of the first group on the basis of variables which need to be con- 
trolled; thus any found difference between Mi and will not be 
attributable to differences between the 2 groups with respect to 
the vanables used m forming the pairs, since the pairing tends to 
make the groups equivalent on the pairing vanables This same 
pairing procedure, and also twm or litter pairs, can be used for 



Differences between Other Descriptive Measures 87 

situation e. Furthermore, as we shall see below, the Xi and X 2 
scores can themselves stand for changes. Xi the change^from 
pretest to posttest imder an experimental condition and X 2 the 
change under another expeiimental condition or under control 
conditions 

The statistical advantages of having scores which are somehow 
related will be discussed later under the caption “Reduction of 
sampling errors.'^ 

Difference between independent means. When we have 
means for 2 samples which have been dralvn independently there 
will be no way of pairing scores except on a chance basis and 
chance painng will tend to produce a zero correlation. In fact, 
if we took all possible pairs the correlation would be exactly zero 
Thus the correlation term in (25&) vanishes, so that the standard 
er^:or of the difference between means based on independent sam- 
ples becomes 

<rz)^ = + (26) 

This formula is not restricted to samples of the same size; i e , 
JVi need not equal 1^2 The right-hand jform of (26) has an obvious 
computational advantage. 

The o-Djf obtamable by formula (26) may be used m exactly the 
same manner as the standard error of the difference by formulas 
(25a) and (256) Again, one sets the null hypothesis that 
Mpopci) = Mpop( 2 ) or that the difference between the population 
means is zero If it is zero, the samphng distribution of Dm re- 
sulting from successive replications will center at zero with 
SD = o-Bjf If Dm^bu (or CR) is sufficiently large one rejects 
the null hypothesis; if not, it is accepted In other words the 
general procedure for testmg hypotheses about differences is pre- 
cisely the same for means (and other statistical measures) as that 
outlined m the previous chapter. The student would do well to* 
review the discussion dealmg with admissible hypotheses, one-,' 
tailed vs two-tailed tests, choice of level of sigmficance, and the 
2 types of error one nsks m testing hypotheses. 

Differences between other descriptive measures. The 
general theory of hypothesis testmg is applicable for descriptive 
measures other than proportions or means The general pattern 
for the standard error of the difference between any 2 statistical 



88 Inference: Continuous Variables 

measures, say jSi and S 2 , is 

“ o’Ds ~ — 2rsiS2^Si<^St 

That is, we need to know the standard error for both Si and ^2 
and a measure of the correlation between /Si and S 2 in case of non- 
independence (the r term drops out for independently drawn sam- 
ples) The appropriate correlation between means is, as we have 
indicated, known to be ri 2 , the correlation between the sets of 
scores; and the correlation between o-'s is known to be r- 12 , 
thus the standard error of the dijfference between 2 o-’s based on 
the same mdividuals or on scores related consanguineously or 
related by pairing on pertinent vaiiables is given by 

“ 2r^i2<r<ri(r^a (27a) 

and for cr's based on independent samples 

- . 707 ,.„ (m) 

These formulas are valid for large N’s (100 or more), and to 
test the null hypothesis one simply takes as a CR, with 

ffD^ bemg computed by (27a) dr by (275)7 wEicEevefis appropriate. 
(For iV’s small, see Chapter 14 ) 

The difference between medians based on correlated scores can- 
not be tested because the needed r is unknown, but for inde- 
pendent samples we have 

mdni + mdn2 

Expressions for and for o-pg can be similarly written for the 
case of independent samples 

Any student who is worned because formula (20), on p, 59, 
for the standard error of the difference between correlated propor- 
tions does not include an r term may rest assured that the correla- 
tion has been allowed for even though not visibly so. Formula 
(20) is analogous to formula (25a), which we have seen is equivar- 
lent to the longer formula (256) in which there is an r. 

REDUCTION OF SAMPLING ERRORS 

One of the aims of scientific method is to attain as great pre- 
cision m results as is practicable. In statistical work this can be 



89 


Reduction of Sampling Errors 

accomplidied by increasing the accuracy or dependability of the 
scores or individual measurements or responses and by decreas^lg 
the chance sampling errors of the vanous descriptive measures 
One way to reduce samplmg errors is to employ either the stratified 
or the area method of sampling, both of which are too complicated 
for us to discuss here. If the random samplmg method is bemg 
used in projects which aim to study the difference between groups 
(or populations), the obvious, and only, way for decreasmg the 
standard error of the difference is to increase N for either or for 
both samples Most field mvestigations are of this type. 

In contrast, the expenmentalist can define his population with 
reference to 2 laboratory or experimental situations, i e., a popular- 
tion of individuals under situation A and a population of indi- 
viduals under situation B; his sample individuals for the 2 situa- 
tions may be the same mdividuals, first imder the A and then 
under the B condition In general, the use of the same individuals, 
if feasible in view of possible practice or fatigue effects, will usually 
involve a fairly high degree of correlation, the net effect of which 
is to reduce the standard error of the difference considerably; 
that is, it is sometimes possible to reduce sampling error simply 
by using the same individuals as the “two” samples. Thus, if we 
wish to study the effect of 2 different degrees of hunudity on men- 
tal output or eflSiciency, it will be a more economical and better 
controlled experiment if we make observations on the same indi- 
viduals imder the 2 conditions A and B, rather than on Ni mdi- 
viduals imder condition A and N 2 individuals under condition B. 

If it is not feasible to use the same individuals m the 2 experi- 
mental situations, we can make up 2 groups by pairmg or matching 
individuals on the basis of 1 or more characteristics. Such a 
procedure leads to more nearly comparable groups for our experi- 
ment than can be obtained by choosing individuals at random 
and, by using either formula (25a) or (256) instead of (26), we 
can make allowance for the fact that the mdividuals for the 2 
samples have not been chosen independently The use of mdi- 
viduals who have been paired is considered good experimental 
technique — ^it cannot be said that a found difference between 
means for the variable bemg studied may be due to a lack of com- 
parability of the 2 groups with respect to the matching variables. 
The use of paired mdividuals has a statistical as weU as experi- 
mental advantage in that the sampling error of the difference 



90 


Inference: Continuous Variables 


between means is thereby reduced without the necessity of in- 
creasing the number of cases If pairing produces an r of 75, 
the reduction in is equivalent to that achieved by quadrupling 
the number of cases when the random method of foiming groups 
IS employed After the student has learned about correlation he 
will better appreciate the fact that the gain in pairing depends 
upon the extent to which the vaiiables used in pairing are corre- 
lated with the variable being studied. 

It IS thus seen that, for some types of investigations, greater 
precision can be obtained by judicious planning If one had un- 
limited resources, he could always attain any desired degree of 
precision by simply taking suj05ciently large samples 

Fiequently the question is raised as to how many cases should 
be secured for a given study The answer might be in terms of 
the number needed to reach a given degiee of accuracy, but this 
in turn would raise the question of what degree of piecision is 
needed, and this in turn depends on how small a difference we 
wish to detect When group comparisons are made and when the 
N^s are relatively small, the null hypothesis is apt to be accepted 
too often for the simple reason that a real difference has to be 
sizable before it is demonstrable by small samples On the other 
hand, if a real difference is so small that its statistical demonstra- 
tion requires thousands of cases, one may question whether it 
has practical or scientific importance. 


COMPARISON OF CHANGES 

Although the comparison of changes involves nothing new in 
the way of statistical theory, such comparisons are somewhat 
more complicated than the tests of significance so far discussed 
The researcher may be interested in either of 2 questions. First, 
he may wish to evaluate the effect of only 1 experimental condi- 
tion or, second, he may wish to contrast the changes produced 
under 2 (or more) different expenmental conditions. 

For the first of these, a sample is selected, measurements are 
made prior to (pretest) and subsequent to (posttest) the provided 
condition, but, smce changes from a first to a second measure 
might occur because of practice effect or because of some other 
experience beyond the control of the investigator, it is necessary 
to set up a control group the members of which are measured and 



91 


Comparison of Changes 

then remeasured, at chronological times corresponding as closely 
as possible to those of the pretest and posttest of the experimental 
group It is presumed that all uncontrollable ejEEects will be oper- 
ating similarly on both groups so that any difference in change 
for the 2 groups will have resulted from whatevei was done to 
the members of the experimental group The statistical problem 
IS that of evaluating the change shown by the experimental group 
compared with that shown by the controls 

For the second type of question the mvestigator starts with 2 
experimental groups, one of which is subjected to 1 experimental 
condition and the other to a second experimental condition, both 
groups having been measuied pnor to the experience (pretest), 
and then agam after the expenence (posttest) Since the ques- 
tion IS concerned with contrasting gams (or losses) associated 
with the 2 conditions, a control group is not needed Presumably, 
uncontrollable factors are alike for the 2 groups The statistical 
analysis consists of testing for significance the difference between 
the changes shown by the 2 groups 

Whether we are dealing with a problem calling for an experi- 
mental and a control group or for 2 expenmental groups, the 
2 groups may be drawn at random or formed on the basis of the 
pairmg of individuals on pertinent vanables. If the groups are 
set up on the basis of pairing we need to allow for that fact when 
determinmg the required standard error of the difference between 
changes 

Parenthetically, it may be said that the setup which involves 
an expenmental and a control group (or 2 expenmental groups) 
for studying shifts has led to a great deal of confusion regarding 
the proper statistical handling of the data We have a total of 
4 means, for the pretest and the posttest for each of the 2 groups 
By using a combination of subscripts, 1 and 2 for the pretest and 
posttest, and E and C to represent the 2 groups, we can specify 
the means as Mei, Mb 2 , Mcu ai^d Mc 2 - Not all the possible 
differences between these 4 will have meaning. Those that have 
meaning may be set forth as 

De = Mei “ ATesj the change shown by the experimental group. 
Dc Mc\ Mc 2 , the change shown by the control group. 

Di = Mei — Mci, the pretest difference between the groups. 

D 2 = Me 2 — Mc 2 , the posttest difference between the groups. 



92 


Inference; Continuous Variables 


Which of these 4 meaningful differences should we test for sig- 
nificance? Obviously, it is insufficient to test only De because 
we can't be sure that the shift shown, even though nonchance, is 
really due to the interpolated expenence In fact, the reason for 
having the control group is to enable us to evaluate the shift 
which takes place as a result of causes other than the experimen- 
tally provided expenence Now it might be thought that if De 
IS significant while Dc is less, or not at all, significant, an effect 
has been demonstrated This type of comparison, however, does 
not provide a check on the net change. Some have argued that 
if Z >2 IS significant while Z)i is not, one can safely conclude that 
the interpolated experience has had an effect This comparison 
also fails to test the net change. We should test the significance 
of the difference between the 2 changes, i e , D = — Dc, in 

order to gauge properly the net shift Although, as regards abso- 
lute magmtude, De — Dc will always equal D 2 — Di, it is easier 
to evaluate the former difference. 

To get the standard error of D (= D^ — Dc) when the groups 
have been independently diawn we need the sampling variance of 
De and Dc so as to substitute in 




^ Dp + Dr 


Now Since De — Mei — Me 2 is the difference between 2 means 
based on the same persons, we could get the standard error of 
De by using formula (256), but since the difference between corre- 
lated means is equal to the mean difference, Mebj we can use 
formula (25a) to get the required This same situation holds 
for the control group, so (25a) would also be used to get 

If the experimental and control groups have been formed by 
pairing, our standard error of the difference between changes will 
require an r term to enable us to take advantage of the fact that 
we have a better controlled experiment The required r is the 
correlation between the changes shown by the members of the 
pairs; to compute it we need to consider the paired changes. We 
can, however, get the standard error of the difference by way of 
the algebraic difference between the changes shown by the mem- 
bers of the pairs, without computing an r. 

Let Xi and X 2 stand for pretest and posttest scores and let 
the members of the /th pair be designated as J and J', with J 



93 


Compaxison of Changes 

assigned to the experimental, and J' to the control, group. Each 
individual will have a change score which is nothing more than 
his pretest score minus his posttest score. Thus the change score 
for the members of the J'th. pair will be 

Cj — JDj = ■X'lj and Ojt — Djt = ^ 2 y 

Hence the difference between the changes (or differences) shown 
by the members of any pair will be 


= (C, - Cy) = (A - AO 


= (Xij — X20) — (Xij/ — Xzy) 

For N pairs we will have N D's These D's are tedious to com- 
pute since one must preserve the same direction for each subtrac- 
tion and keep track of signs The process can be made somewhat 
simpler by removing the parentheses, thus 


D = X\j — X 23 — ^ 13 * “1“ X* 


2y 


Simply add Xij and X 2 y and then subtract the sum of X 2 J and 
Xiy with the sign for D depending on whether the first or the 
second of these 2 sums is the larger 

Once the N D’s have been determined, we can get Mdi <rD, and 
thence (tmd ^7 formula (25a). This Md will equal De — Dc, or 
(Mei — Me 2 ) — (Mci — Mc 2 ), and this ctmd will be exactly the 
same as 

After the student has learned how to compute r, he may prefer 
to use this longer formula for (tdo (equivalent to <tmd) rather than 
go through the tedium of differencmg differences. Regardless of 
how the standard error of the difference is obtained, one tests the 
null hypothesis by calculatmg an a:/or (or CR), as the net differ- 
ence between the 2 changes divided by its standard error. The 
foregoing procedures are also applicable when one is dealmg with 
2 experimental conditions One needs only to use appropriate 
subscripts in place of E and C. 

The general pattern outlmed on pp 90—92 holds for the 
comparison of changes for attributes when the groups have been 



94 


Inference: Continuous Variables 


independently drawn. We merely substitute p’s (proportions) in 
place of the ibf’s. Thus we would have 

Dje = Pei ““ VB 2 ^'^id Dc = Pci “ PC 2 

as changes in proportions for the experimental and control groups 
The variance error for De (and also Dc) is obtained by formula 
(20) on the basis of the tabulation scheme set forth on p. 57. 

INFERENCE: ESTIMATION 

So far we have discussed statistical inference from the view- 
point of hypothesis testing, but there are occasions when one may 
wish to use information from a sample as a basis for estimating 
population values. There are 2 general types of estimation point 
and interval We shall discuss the first briefly in order to intro- 
duce some concepts which the student might encounter, and the 
second because of its practical implications. 

Point estimation. We may regard a sample statistic as an 
estimator for the correspondmg population value (paiameter) 
How ^*good” an estimator it is depends on whether or not it is 
unbiased and consistent, and on its relative efficiency. 

An estimator is said to be unbiased if the average of a large 
number of sample estimates tends to equal the parameter bemg 
estimated. The mean is unbiased because the mean of sample 
means will approach nearer and nearer M^op as we take more and 
more samples, but defined as 'Zx^/N is biased in that the mean 
of sample variances tends to be smaller than the population vari- 
ance. An unbiased estimate of is given by s^ = Zx^/{N — 1), 
but for subtle mathematical reasons s, or 'v/Sx^/(iV' — 1), involves 
a negligible bias as an estimator of the population standard devia- 
tion. Note that the bias is small when N is large. 

An estimator is said to be consistent if it approaches nearer and 
nearer the population value as sample size is increased indefinitely 
All the measures so far discussed satisfy this criterion 

The efficiency of an estimator is a function of its sampling error. 
Thus, in terms of efficiency the sample mean is far better than 
the median as an estimator of the central value of a population of 
normally distributed scores even though both are unbiased and 
consistent estimators. 



Interval Estimation: Confidence Interval 95 

Interval estimation: Confidence interval. Interval esti- 
mation, which takes mto account the sampling error of an esti- 
mator, provides limits, or an interval, for the population value, 
and at a prescribed level of confidence Given a sample mean and 
its standard error, one could set up a whole senes of ‘‘trial” hypoth- 
esis values for the population mean All trial hypothesis values 
well above and below the sample mean could be rejected at a high 
level (small P) of significance, but rejection would become more 
and more risky as we approached nearer and nearer the sample 
mean, and for a whole senes of values near the sample mean all 
tnal hypotheses would be acceptable Now this implies that at 
some point above the sample mean and at some pomt below the 
sample mean we change from rejection to acceptance of the trial 
values If we have adopted, say, the P = 05 level, the change 
will obviously be at ikf zfc 1 96<rM In rejecting trial values out- 
side these limits and accepting values within these limits we are 
in effect inferring that the population value is in an interval 
defined by these limits 

It would seem that there should be some w-ay of expressmg our 
degree of confidence that the population mean lies between the 
limits M dz 1 96<rMj since, as we have seen, we can be somewhat 
sure that the sample mean is not a chance deviation from a popula- 
tion mean outside the limits so determined Note that, given a 
population mean and sigma, we can legitimately speak of the 
probability of a sample mean fallmg m a specified region, but 
given a sample mean we cannot speak of the probability of the 
population mean bemg in a certam region (or mterval) for the 
simple and compellmg reason that Mpop, being definitely just 
1 value, has no distnbution We can m no way enumerate events 
so as to conceive of a probability fraction smce just 1 event (value) 
is possible. 

In order to amve at a statement which expresses our degree of 
confidence, we note that, if we draw a second sample, we would 
be apt to have a different set of limits for the simple reason that 
the second sample mean may differ from the first If we take 
additional samples of the same size, we would have a distribution 
of sample means, hence a sort of distribution of sets or pairs of 
limits, since each sample mean would provide a set. Our discus- 
sion can be greatly simplified by taking sets of limits given by 
M zfc 2<rjif (as approximatmg the M dz 1 96aM values). For sim- 



96 


Inference: Continuous Variables 


plicity of exposition, let us assume that we are drawing successive 
saibples from a population having a mean of 10, and that the 
variability and N are such that (tm can be taken as 2 Then 
M zb 2crjif will be M zb 2(2), or ilf i 4 It will also facilitate our 
exposition if we think of the random sampling distribution of 
means in terms of intervals of ^<r distances on the base line with 
the approximate percentage area for the several intervals, as 
shown in the top curve of Fig 10. 



L : 
0 1 



LL curve " UL curve 

Fig 10 Generation of confidence limits 


Now each possible sample mean will lead to a lower limit of 
— 4 and an upper limit of M + 4 If we consider the 
19 per cent of sample means expected between 9 and 10, we see 
at once that these 19 will lead to intervals with lower limits be- 
tween 5 and 6 and upper limits between 13 and 14 That is, the 
sample means falling between 9 and 10 will generate that part of 
the lower limit (LL) curve of Fig 10 between 5 and 6 and that part 
of the upper limit (UL) curve between 13 and 14. Likewise the 
15 per cent of sample means falling between 8 and 9 will lead to 
the 4 to 5 part of the LL curve and to the 12 to 13 part of the 
UL curve. Similarly, as can be seen by careful study (a require- 
ment for most students if imderstandmg is to be achieved) of the 
3 curves of Fig. 10, every left-hand segment of the top curve gen- 
erates a leftr-hand segment for each of the bottom curves. Stated 



Interval Estimation: Confidence Interval 


97 


differently, the left half of the top curve leads to a distribution of 
intervals with lower limits less than 6 and upper limits of less than 
14 In exactly the same fashion it can be seen that the nght half 
of the top curve leads to the right half of the LL curve and also 
the right half of the UL curve. Thus we have a samplmg distri- 
bution of intervals (sets of limits) as found by taking JIf ± 4 
(or M db 2(tm) Our next task is to ask how many of these various 
mtervals actually include 10, or the population mean Reference 
to Fig. 10 will verify that, out of 100 tries, we would expect to get: 

4 times an interval with LL of 2 to 3 and UL of 10 to 11 


9 

ti 

cc 

cc 

cc cc 

“ 3 to 4 

Ct, 

cc 

cc 

11 to 12 

15 

Ct 

cc 

cc 

cc cc 

“ 4to5 

CC 

cc 

cc 

12 to 13 

19 

tc 

cc 

cc 

cc cc 

“ 5 to 6 

cc 

cc 

cc 

13 to 14 

19 

ti 

cc 

cc 

cc cc 

“ 6to7 

cc 

cc 

cc 

14 to 16 

16 

cc 

cc 

cc 

cc cc 

“ 7to8 

cc 

cc 

cc 

15 to 16 

9 

it 

cc 

cc 

cc cc 

“ 8 to 9 

cc 

cc 

cc 

16 to 17 

4 

cc 

cc 

cc 

cc cc 

“ 9 to 10 

cc 

cc 

cc 

17 to 18 


Notice that for every set of limits m the foregoing groups the 
population mean in the range or interval defined by the upper 
and lower limits of the set When we sum these expected fre- 
quencies, we see that 94 per cent of the sets of limits lead to mter- 
vals within which the population mean lies If we had not rounded 
to the nearest per cent, these would sum to 95 45 per cent. This 
implies that 4 55 per cent of the time the intervals so defined 
would not include the population value This can be verified by 
noting that sample means of less than 6 (top curve) lead to upper 
limits of less than 10, and do so 2 27 per cent of the time, whereas 
sample means of more than 14 produce lower limits of more than 
10 about 2 27 per cent of the time These percentages are for 
the tails of the bottom curves, to the left of the ordinate at 10 for 
the UL curve and to the right of this ordinate for the LL curve. 

In summary, if one were to make m his lifetime 100 inferences 
concemmg population means on the basis of sample values by 
each time taking the limits as ilf ± 2<rAf, the limits so established 
would include the population value about 95 per cent of the tnes. 
That IS, m the long run he would be correct about 95 per cent of 
the time in concluding that the population value is Avithm the 
intervals so determined, and about 5 per cent of the time he would 
be in error If he used Mil 96<rjif for setting limits, he would 



Inference: Continuous Variables 


be correct 95 per cent, and in error 5 per cent, of the time When 
take M db 1 96(rM as a confidence interval, the degree of faith 
in such limits is represented by a P of 95; i e., the level of confi- 
dence for such an inference is represented by a probability-type 
figure of .95 If we wish to be suier of our inferences, we might 
choose the .99 level of confidence, which m practice can be at- 
tained by taking M ±.2 58<rji/ as limits 

The limits set by the confidence interval method are so very 
similar to fiducial limits, and the level of confidence, sometimes 
referred to as the confidence coefficient, is so much like fiducial 
probability that the beginning student can well let the mathe- 
matical statistician worry about the theoretical difference between 
what seems to be 2 ways of doing the same thing. 

Confidence intervals can be set up for statistical measures other 
than the mean, but if the random sampling distribution of a given 
measure is nonnormal the method will not be the simple stunt of 
taking iS ± 1 96(75 or /S =b 2 58<rs where S stands for any statistical 
measure It should be obvious that, since the standard errors for 
all statistical measures are a function of N, it is possible by in- 
creasmg the sample size to narrow the confidence interval without 
any loss in the degree of confidence with which we accept the 
limits 

Confidence interval for a difference. There are times when 
it IS desirable not only to know whether a difference is significant 
but also to specify limits for the population difference. Such 
specification does not presume that a significant difference has 
been found Even when a difference fails to reach significance, 
the specification of confidence limits gives one some idea of the 
possible difference between population values, and such informa- 
tion may help answer the nonstatistical question of whether the 
population dfflerence is apt to be large enough to be of practical 
or scientific importance This procedure may be helpful in eval- 
uating the consequences of accepting the null hypothesis when 
the hypothesis is in reality false 

Furthermore, the settmg up of a confidence interval may be 
particularly helpful when we have obtained a difference which is 
highly sigmficant Consider the case of a difference of 4 78 inches 
m mean height between men and their sisters Because of large 
and the presence of brother-sister correlation, the standard 
error of the difference is very small. Its value is about 07. When 



Infinite vs. Finite Universe 


99 


we compute D/od we have a critical ratio of 68. This would, if 
we could evaluate it, yield a probability, for as large a dififereace 
by chance, which would be so microscopically small that we could 
not comprehend it However, when we set confidence limits at, 
say, the .99 level, we have 4 78 db 2 58( 07), or 4 60 and 4.96, as 
limits for the population difference This permits a down-to- 
earth way for evaluating the obtained difference 

Level of confidence vs. level of significance. The term 
“level of confidence” should not, as is frequently the case, be mis- 
used in place of “level of significance ” The first term pertains 
to interval estimation, the other to hypothesis testing 

QUESTION OF ASSUMPTIONS 

It may be well to consider bnefly the assumptions underlying 
the procedures so far discussed for making statistical inferences, 
since assumptions restrict the applicability of a method 

Independence of sampling units. It is assumed that the 
conditions of random sampling hold, but the frequency with which 
the requirement of independence is violated by researchers sug- 
gests that a warning is needed The violation usually comes 
about when one makes multiple measurements or observations on 
each of the mdividuals in a sample and treats each measurement 
(or response) as a sample value, thereby inflating N Vriold times 
w^hen n repeated measurements (or responses) are available for 
each person The lack of independence comes about m that, for 
instance, if the sample of individuals happened to include 1 high 
sconng person there would automatically be n high scores. The 
effect of such an inflation of N is an illegitimate reduction in 
standard errors 

Infinite vs. finite universe. If we are sampling from a finite 
umverse, particularly a universe with a rather small number of 
cases, it seems reasonable to think that as the sample size becomes 
large relative to the number of cases in the universe the sample 
mean, for example, will tend to fluctuate less from the umverse 
mean than is the case when drawing from an infinite population. 
This suggests that the standard error formulas need to be modified 
for the finite population situation. The required modifications 
are available for only a few statistical measures If we let N 
represent the sample size and Npop the size of the finite umverse, 



100 Inference: Continuous Variables 

the standard errors for the mean and for a proportion are as 
foPows’ 

aj, = Vl - N/Nj>op and <rp = 

In a given reseaich it is sometimes difficult to decide whether 
the universe being sampled is finite or infinite in size, and, if finite, 
it is not always easy to determine the value of Npop- It might be 
argued that psychologists never study an infinite universe. It 
can readily be seen, however, that the corrective factor m the 
sampling error formulas becomes negligible when Npop is large. 
Thus, if Npop IS known to be laige relative to N, it matters little 
whether the given imiverse is wrongly conceived as being infinite 
For example, when N is .01 of Npop, the corrective term leads to a 
reduction in the sampling error of about 005 of the value obtained 
by the ordinary formulas 

These formulas for the finite universe situation are frequently 
useful when we wish to compare a subgroup with a total group 
which contains the subgroup Such a comparis on is sometimes 
erroneously made by taking V c^t/Nt + c^JN^ as the standard 
error of the difference between the subgroup mean, Ms, and the 
total mean, Mu This makes no allowance for the fact that the 
2 means are not based on independent groups An appropnate 
procedure is to regard M^ as based on a sample drawn from a finite 
universe of Nt cases with mean and standard deviation of Mt and 
iTt] then with the standard error of Ms taken as 

Vl - N,fNt 

we can test the significance of the deviation of Ms from Mt by 
using the ratio {Ms — M^/a^,, which is interpretable as a cntical 
ratio, or CR. This ratio will give a very close approximation to 
the CR which would be obtained if we were to compare the sub- 
group with the remainder (the total cases less the subgroup) as 
2 independent groups, using the usual formula for standard error 
of the difference The foregoing scheme would also be applicable 
in case proportions instead of means were the descriptive meas- 
ures used as a basis for comparison. 

Skewed distributions. The standard error formulas given in 
this chapter assume normal or nearly normal score distributions 



101 


A Further Word on Proportions 

for the population being sampled. Skewness is the most fre- 
quently encounteied evidence for nonnormality, and accordii^gly 
it is of interest to consider the effect of skewness on the samplmg 
distribution of the mean, the measure most apt to be involved in 
testmg hypotheses The relationship between the degree of 
skewness, gi, for a variable and the amount of skewness for the 
samplmg distribution of means is qm = 9i/\/N Thus the skew- 
ness in the distribution of means rapidly disappears as N is taken 
larger and larger For example, if gi is 77 (see Fig 6, p 30) 
and N is 35, the skewness for the samplmg distribution of means 
will be only .13 (see Fig 6 again). Accordingly, the procedures 
in this chapter may be safely used with moderately skewed distn- 
butions when N is large and with markedly skewed distributions 
when N is very large Some methods for handling nonnormal 
data will be discussed in Chapter 18 

A FURTHER WORD ON PROPORTIONS 

The student will have noted that the general principles of 
statistical inference set forth in Chapter 5 have been utilized and 
extended m the present chapter. There are many points of obvi- 
ous similarity in the 2 chapters, but there is an additional paral- 
lelism which is not obvious For an attribute mvolvmg a dichot- 
omy such as yes-no, hke-dislike, pass-fail, etc , we may arbitrarily 
assign a score of 1 to one category and a score of 0 to the other. 
That IS, X = 0 or 1. 

Let /o and fi stand for the frequency of, say, no and yes re- 
sponses respectively in a sample of N cases Thus we have a 
mimature frequency distribution, with the 2 categories bemg 
analogous to 2 intervals. Let’s consider the mean and standard 
deviation of this miniature frequency distribution, both in terms 
of gross score formulas Notice that in Table 9 we have a score 

Table 9 Scheme fob Mean and Standard Deviation or a Dichotomous 

Varlajble 


Response 

X 

/ 

fX 

/X* 

Yes 

1 

fi 

/id) 

/l(l)* 

No 

0 

/o 

/o(0) 

/o(0)* 

Sums 


2V 

II II 

/i(l) 
- SX* 



102 


Inference: Continuous Variables 


enliimn, X, a frequency column, f, mfX and an/X® column (anal- 
ogpus to fd and /d®, with d = JO- It will be seen that SX = fi; 
hence the mean of the distribution is M = SX/AT = fi/N — p, 
where p is the proportion of yeses Hence a proportion may be 
regarded as a mean 

It will also be seen that SX® = fi; hence when we utilize for- 
mula (66), p 25, to write the vanance of the distribution we have 

- (2X)^ 

AT®. 

^ ^ A] 

L N^\ 

« (p - P^) = P(1 ~ P) * pg 


Hence a = as the standard deviation of the dichotomous 
distribution (Any connection with the M and cr for the binomial^) 
In this chapter we have given au *= <r/\^ as the standard 
error of a mean If this holds for the dichot omous distribution 
we would have o-m = = ‘\/pq/N. But this is the 

same as <rp given by formula (186), p 64, of Chapter 5 This is 
as it should be since p = M for the dichotomous distnbution 
Furthermore, formula (20) for the standard error of the differ- 
ence between correlated proportions has its analogue in the de- 
velopment on pp. 83-85 for the difference between correlated 
means, and formula (21) involves a pattern similar to that of 
formula (26). 


NOTE ON THE PROBABLE ERROR 

An antiquated procedure is the use of the probable error, 
instead of the standard error in connection with sampling. The 
pe of the mean is 6745aaf, and therefore we would expect 50 
per cent of successive sample means to fall between Mpov dz pejif • 
Similarly, the pe for any other statistical measure is .6746 times 



Note on the Probable Error 


103 


its standard error. Since no additional information is yielded 
by multiplying the standard error by a constant, the continuance 
of this practice is being discouraged The student who attempts 
to survey the research literature on a given topic is apt to en- 
counter pe's and he therefore must know the relationship of the 
pe to the standard error. 



CHAPTER / 

Small Sample or t Technique 


Although the general principles of statistical inference are the 
an-TTift for both large and small samples, the techniques differ. 
We gbfl.ll confine our attention in this chapter to the technique 
for dpjilmg with a single mean and with the difference between 
2 Tr>ftfl.ng Chapter 14 \nll deal with inferences concerning varia- 
bilities. 

It will be recalled that the sampling distribution of the mean 
is normal when the trait distribution is normal. This holds re- 
gardless of sample size The samphng distnbution of means 
centers at the population mean with a true standard deviation 
apap/y/W, which sigma we termed the true standard error of 
the mean. Recall also that the relative deviates, (M — Mpoj^/ 
follow the unit normal curve. When succesave samples are drawn 
and a ffjif is computed for each sample by using the sample SiD 
instead of vpap (an unknown), the ratios of given (Jf — AfjK,p)’s 
to their vu values so computed will be distributed normally for 
very large JV’s and approximately so for W’s of moderate aze, 
but for W’s as small as 30 the approximation is none too good. 
The value 30 is arbitrarily chosen — ^the approximation to nor- 
mahty becomes progressively worse as we go from large to small 
JV’s rather than becommg abruptly worse in the vicinity of JV = 30. 

We have already mentioned the fact that <r® = S»®/JV suffers 
from bias, whereas_s^_S^/(iV — 1) is an unbiased estimator 
of the population variance. Smce the bias in a mcreases with a 
decre^ in JV, it is important to use the unbiased estimator when 
JV is small. We will accordingly use sm = s/-\/N, m place of 
<riif == as a nonnegligible improvement in the estimate of 

the standard error of a mean based on a small sample. Even so, 
the successive sample ratios, (JIf — Mpop)/8it, with Stf computed 

104 



The t Distribution 


105 


from each sample, will not follow the iimt normal curve because 
the sampling distribution of s (also a) is skewed for N small; 
hence the distribution of successive values of sm will be skewed 
That is, the successive sample values of (M — Mpop)/sM will in- 
volve a vanable numerator which is normally distnbuted and a 
vanable denommator which has a skewed distnbution The 
distribution of the resulting ratios will be symmetrical about zero 
but will be less flat-topped than the normal curve, and the smaller 
the N the more leptokurtic (less flat-topped) the shape of the 
curve Another characteristic of the sampling distribution of 
(M — Mpop)/sM is that the tails of the curve beyond ratios of 
about 2 tend to be higher than the tails of the normal curve; 
that IS, there will be relatively more large ratios. 

Th^t distribution. It can be shown that such ratios, mvolvmg 
a normally distributed deviate divided by an unbiased estimate 
of its sampling error, will follow the so-called t distnbution, de- 
fined by 



in which T indicates the gamma fimction as defined in texts in 
advanced calculus Although this equation will be beyond the 
mathematical comprehension of most students, it should be noted 
that y is the height of a curve, that since t is squaied the distnbu- 
tion is symmetrical, and that the equation contams an as yet 
undefined. This n has to do with the number of degrees of free- 
dom, a concept which is discussed below. Suffice it to say just 
now that n will be a function of sample size (or sizes) and accord- 
ingly that there will be not 1 but many distnbutions of tj one for 
each possible value of n 

Figure 11 shows the curve of t, when n = 7 and when n — 3, 
as compared to the normal curve. For n larger and larger, the 
curve of t approaches that of the normal distnbution JCable E 
of the Appendix gives the values of t, for n's of 1 to 30, which 
will be exceeded by chance a specified proportion of times. Thus 
for n = 30 we see from Table E that the P = .05 point is at a i 
of 2.04 as compared to a normal deviate of 1.96. For n = 10, 



106 


Small Sample or t Technique 



the point corresponding to the 05 level is f = 2 23 The .01 level 
is at i = 2 75 for n = 30, and at 3 17 for m = 10, as compared 
with 2 58 for the normal curve 

Degrees of freedom. The n of the equation for t, and in the t 
table, IS the number of degrees of freedom (df) involved in the esti- 
mate of the population variance The df depends on how many of 
the rc^s in or S(X — are ‘‘free to vary.” Suppose two 
scores, 3 and 5. Their mean is 4, and the sum of squares (of devia- 
tions) is (3 - 4)2 + (5 - 4)2 = 2 Now Sx = 2(X - M) = 

2X-2Jlf = ZZ-iVAf = SZ-iV^ = 0, always. There- 

fore, as soon as 1 of 2 deviations is known, the other x is deter- 
mmable Thus, if Xi is —1, the other deviation, X2, must satisfy 
the equation — 1 4- X2 = 0 One deviation and hence its square 
can be thought of as dependent upon the other deviation, which 
has some independence, and therefore 1 degree of freedom Sup- 
pose that we have 3 scores, 3, 4, and X, which yield a mean of 4. 
The deviations must satisfy the requisite that they sum to zero; 
i e , (3 — 4) 4- (4 — 4) + (Jf — 4) - 0 Thus 1 of the 3 devia- 
tions is fixed by the other 2, i.e , is not independent of their values, 
because the 3 deviations must sum to zero. 

It may be more enlightening to start with s3rmbols for scores 
Suppose that Zi, Z2, Z3, and Z4 represent 4 scores, and it is 
reported that their mean equals 40. How many of the 4 devia- 
tions can we assign at will? Stated in deviation units, we have 



107 


t for a Single Mean 

(Xi - 40) + (X 2 - 40) + (X 3 - 40) + (X 4 - 40) as a sum 
which must equal zero It is readily apparent that only 3 
deviations can ‘‘vary freely” — ^the fourth is fixed by the numerical 
values of the other 3 Hence d/ = 4—l,ie,l degree of freedom 
in the deviations or their squares is lost because of the 1 restric- 
tion imposed Actually, this restriction comes about because we 
are taking deviations about 1 constant, the mean, computed from 
the set of scores at hand. The df for a sum of squares (of devia- 
tions) about a mean is always iV — 1 when N scores are used to 
compute the mean In general, the df for the sum of squares is 
equal to the number of squares minus the number of restrictions 
imposed by constants computed from the data. 

Note that the unbiased estimate of the population variance, 
§2 — Sa;^/(JV’ — 1 ) involves dividing by df, the number of degrees 
of freedom. This is a general rule 
Computation of or s. For N small the mean and or s 
are readily computed from gross scoie formulas Thus M = 
'StX/N. To compute or s we need m terms of gross scores. 
This was given earlier (p. 25) as 

See® = - - (SX)®] ( 6 ffl) 

N 

Dmdinig iihis by iV' — 1 3 delds s®, the square root of which is the 
required s. An easdy denved relationship between s® and is 

^ <r* Although we do not need a frequency distribu- 

tion for purpose of computations, a distnbution should be made 
anyway so as to permit at least a rough check on the assumption 
that the scores have been drawn from a normally distributed 
population of scores 

t for a mean. We can test the significance of Af as a 

deviation from any hypotherized value for the mean, Mh, by 
+.a1finE t- - Mh)/SM as an entry m Table E, with n df = 
W — 1, to see whether the obtamed t reaches the t value required 
for certam levels of significance. If the i does not reach the value 
required for the chosen level of significance, the deviation would 
be attnbuted to chance and the hypothesis accepted 

If one wiriies to spedfy the confidence lunits for the unknown 
population and to do so with a level of confidence indicated 



108 Small Sample or t Technique 

by P = 99, he first notes from the table of t how large t must be, 
for the given df, to correspond to the 01 probability level. Then 
M plus and mmus the so found, times sm will give the desired 
limits. For example, suppose 9 cases yield a mean of 80 and a 
sum of squares of 1152 Dividing the sum of squares by df, or 
8, we get 8^ = 144, s = 12 as an estimate of fXpop, and s\f = 12/\/9 
= 4 For 8 df we find from Table E that t == 3.355 for the .01 
level. Then 80 zb (3 355) (4) gives 66 58 and 93 42 as the .99 
confidence limits for the population mean If we used the large 
sample method of the previous chapter, we would have 
0 ^ = 1152/9, giving cr as 11 31, from which we would get (tm = 
11 31/\/9 = 3 77 Since for the normal distribution a relar 
tive deviate of 2 575 corresponds to the 01 level, we have 
80 zb (2.575) (3.77) or 70 29 and 89 71 as the 99 confidence limits 
for the universe mean These values for the confidence interval 
differ appreciably from those obtained above when proper allow- 
ance was made for the smallness of the sample. 

Difference between correlated means. It will be recalled 
that when we have 2 means based on the same individuals or on 
paired cases, the test of significance of the difference must make 
allowance for the fact that the 2 sets of scores are not random 
with respect to each other. In Chapter 6 we saw that this could 
be done by including the r term in the standard error of the dif- 
ference, as m formula (256), or by working directly with the 
differences between paired scores. It was shown that il4z> = Dm 
and that <tmd = ^dm When we have small samples, it is easier 
to work with ikfo, an estimate of the sigma of the distribution of 
differences between paired scores, and thence To get the 
best estimate of the sampling error of we need the sum of 
squares of the deviations of the pair differences from the mean 
difference, i e , S(D — ilfjo)^, which when divided by the proper 
df, or iV — 1, where N is the number of differences or the number 
of paired scores, gives the best estimate of the variance of the 
universe distribution of differences. Let stand for this esti- 
mate. Then 

The computation is straightforward. Each of the Z5’s is the 
difference between 2 scores, the subtraction being made in the 
same direction for all, and the sum of squares, 2(jD — Mof, is 



Difference between Independent Means 109 

obtained by formula (6a) with the Z's replaced by D's, that is 
2(D — Md)^ = ^ [N'ED^ — (2JD)^]. The D^s are summed alge- 
braically, and their squares are summed After smd been 
calculated, we get t as Md/smd- The hypothesis to be tested is 
that the universe value ot Mjo is zero, the table of t is entered with 
the obtained t and with df — N — 1 in older to see whether it 
reaches a prescribed level of sigmficance Note that the df is 
1 less than the number of D’s, not 1 less than the total number of 
scores (see “Fuither note’’ on df’s, p 111) 

The assumption of normality pertains to the D’s, hence, again, 
even though a frequency distribution is not needed for computa- 
tional purposes, it should be made so as to provide a rough check 
on the assumption A confidence interval for Mjd (and conse- 
quently Dm) can be set up in precisely the same manner as mdi- 
cated above for a single mean 

Difference between independent means. Given 2 groups 
of Ni and N 2 cases, and that we wish to test the significance of 
the difference, Dm = Mi — M 2 By the piocedure of the previous 
chapter for large iV’s, we would make the necessary calculations 
for determinmg DmI<tdm ot CR. As an aid to transition in thought 
from CR to let us first wnte the expression for CR, thus 





Ml - M2 

V (7-^Afi + 2 


Ml - Ms 



which involves the 2 sample vanances Now, for the small sample 
situation, we need t = Dm/^dm where is to be the best possible 
estimate of the standard error of the dijfference To get this we 
apparently need the best possible estimates of the 2 variances of 
the 2 populations from which the samples have been drawm. But 
here we encounter an assumption imderlying t for this situation* 
the 2 populations must have the same variance. Hence, we need 
just 1 estimate, an estimate of the variance common to the 2 popu- 
lations Calling this estimate by analogy with the CR tech- 
nique, we need 


Dm Ml - M2 




110 


Small Sample or t Technique 

estimate, s®, of the common population variance is 
obtained iTy" c^omputing tlie sum of squares separately for the 
2 samples, then combinmg these sums, and dividing by the proper 
df, or 

_ S(X - Ml)" + S(X - Mi? 

® N1 + N2 - 2 

The 2 separate sums are computed by formula (6a) Note that 
2 degrees of freedom aie lost because the sum of squares is about 
2 means, which leads to 2 restrictions Substituting the obtained 
s® in the above expression leads to a t, which is looked up in Table E 
with df, 01 n, equal to iVi + N 2 — 2 in order to see whether it 
reaches a chosen level of significance 
There is one point m the method of determinmg the s®, needed 
for testmg the significance of the diffeience between means, which 
may have puzzled the student The setting of the null hypothesis, 
in eombmation with the assumption of equal population variances, 
implies that the 2 samples have been drawn from a single universe 
or from 2 universes which have the same mean and equal variances, 
for the given and measured trait It mi^t accordingly be assumed 
that the best estimate of the population variance would be ob- 
tained by taking the sum of squares about the combined mean 
rather than about the separate means The former would give 
the better estimate of the variance if it were actually known that 
the 2 umverse means were the same (or that only 1 universe was 
involved), but there is always the possibility that the 2 umverse 
means really differ, li this were true, the taking of the sum of 
squares about the combmed mean would, in general, yield too 
large an s® for the simple reason that the real Terence between 
groups would be contnbutmg to the variability of the 2 groups 
combmed. (The student who has difficulty seeing this point 
should imagine what would happen to the variance of scores when 
2 groups markedly different in means were combined ) It follows, 
therefore, lihat in the long run the best value for s® will be pro- 
vided by summing the sums of squares about the 2 manTifi. 

The procedure for setting a confidence interval when we have 
independent means is no different from that for correlated means. 
Simply take Dm ± 'where is the t, for the given df, re- 
quired for sTgoificance at the P — a level. This will give limits 
for the P s= 1 — a level of confidence Suppose we wish the 99 



Ill 


Comparison of Changes 

confidence interval; this requires an a of 01, or as sometimes 
written, ta = ^oi where foi is found under the P = .01 coluqin, 
opposite the df 

Further note on degrees of freedom. Suppose 2 independent 
groups with Ni == N 2 = Nj and also 2 groups of scores based on 
N cases (or N paired persons) For the former the df is Ni + N 2 
— 2 = 2iV' — 2, whereas for the latter the df N — 1 even 
though in the paired situation the total number of persons is 2N 
This may be (and has been) confusmg to some, it seems as though 
the obviously better plan (matching) leads to a loss m df com- 
pared to the setup involving mdependent groups It is sometimes 
argued that the df would perhaps be larger if we worked not with 
the difference scores but with the 2 sets of scores in terms of the 
sums of squares of deviations for each set and the sum of cross 
products since, as can be seen from p. 84, 

2 (i) — Md)^ = -j- — 22^1X2 

The df for the left-hand sum of squares is obviously JV — 1, and 
since the right-hand side of the equation is merely an algebraic 
vanant of the left-hand side, it does not seem reasonable to believe 
that the dfa will differ for the 2 sides Note that if we consider 
as having N — 1 degrees of freedom, we cannot have any 
more degrees of freedom for the other sums on the right side be- 
cause the X 2 values are not independent of the Xi values; they 
(the X 2 scores) are not ‘‘free to vary '' 

Comparison of changes. In the last chapter (p, 90) we dis- 
cussed the procedures for testing the differences between changes 
shown by 2 groups For the situation mvolvmg paired persons, a 
D for the difference between changes for the members of a pair 
was defined (p. 93), and the test of significance mvolved com- 
puting, for jD’s so defined, an Md, od, and thence (tmd- the 
small sample, or tj techmque we need sjo and smdi l^st as given 
above for correlated means The df is 1 less than the number of 
pairs For the setup involving the changes for independent 
groups, we would need an sdd instead of the of p. 92. The 
required is given by 



112 


Small Sample or t Technique 


in which 

_ S(D - + SP - 

^ Nb + Nc- 2 

with the subscripts E and C referring to experimental and control 
groups. Thus, the procedure for testing hypotheses involving 
changes for 2 groups is precisely the same as that for testing the 
difference between 2 independent means, discussed above — X is 
replaced by D, a difference score. 

One-tailed versus two-tailed test. Our discussion of the 
t technique so far has been in terms of the t value needed for a 
two-tailed test at a given level of significance. If the hypothesis 
to be tested or the decision to be made logically warrants a one- 
tailed test the t required for significance at the .01 level would be 
foimd imder the 02 column of Table E, and for the .05 level the 
.10 column would be used. Those who do not wish to be restricted 
to the P levels given m Table E will find for dfs up to 20 the P 
associated with any t in Table XLV of Peters and Van Voorhis’ 
Siattstical procedures and their mathemahcal bases. This table 
gives one-tailed values, which need, of course, to be doubled for 
two-tailed tests. 

Some comments and cautions. It might be thought that 
the assumption of normality underlying the use of t could be tested 
on the basis of the sample (or samples) at hand either by testing 
the departure of gi (skewness) and g 2 (kurtosis) from zero (or by 
a chi square technique to be discussed in Chapter 13), but these 
methods of testing for normality are not sensitive enough to lead 
one to reject, on the basis of a small sample, the hypothesis of 
normality unless the departure therefrom is very marked. Like- 
wise, the as yet undiscussed test (see Chapter 14) for a possible 
difference between vanances is too insensitive when used with 
small samples to lead to rejection of the hypothesis of equal vari- 
ances unless the difference between the 2 universe variances is 
sizable; hence it is difficult to be sure that the assumption of 
equality of variances is tenable when 2 groups are bemg compared 
by the t technique. The foregoing statements are, of course, 
based on the pr oposition that by statistical methods one can 
^rove, at a desired level of significance, that a sample distribution 
dic^^^apt arise from a normally distributed universe or that 2 uni- 
verse values are different, bijJ such methods will prove nor- 
mality nor prove that 2 universe values are identical. 



Some Comments and Cautions 


113 


A method for testing the significance of the difference between 
2 means when the assumption of equality of variances is not 
tenable may be found in section 4 14 of Expenmmtal designs by 
Cochran and Cox Some methods for handling nonnormal data 
are given later (Chapter 18). 

Suppose that in 1 study the difference between 2 means for 
2 smaU samples leads to a t which falls at the 01 level and that in 
another study 2 large samples yield means, for another trait, 
which are also sigmficantly different at the .01 level. Can we 
place as much rehance on the first difference as on the second? 
The answer is yes, provided the 2 studies have been carried out 
with the same degree of care as regards controls and adequate 
samplmg techniques, and provided it is safe to presume that the 
fundamental assumptions underlying t are tenable Thus our 
confidence in a result based on small samples is a function not 
only of the probability level of significance attained but also ot 
our faith that assumptions have been met Since, as we have 
suggested, the conditions of trait normality and equality of van- 
ajaces are exceedingly difficult to demonstrate when the only 
information available is based on the small samples at hand, we 
are forced to conclude that, in general, we cannot place as much 
reliance on the results from small samples as on those from large 
samples. 

This raises the question of the place of small samples m psycho- 
logical research, and about this there will be a diversity of opinion. 
We do not propose to settle the issue or even debate it, instead, 
we shall mention a few points which we feel are pertinent There 
are, of course, types of research for which it is impossible or prac- 
tically impossible to secure more than a few cases either because 
of their scarcity or because of prohibitive costs. For such situa- 
tions it is fortunate that the small sample or t techmque, which 
permits some allowance for the smallness of the sample or samples, 
is available. Quite frequently small samples may be useful in a 
preliminary study which is earned out solely for the purpose of 
guidmg the experimenter If given hypotheses seem to be verified, 
5ien the next step should be to secure more cases for further 
verification rather than to rush into print with positive con- 
dusions 

It seems to the writer that those who publish statistical results 
based on a small number of cases should, unless they are posi- 
tively sure that the basic assumptions underlying t have been met 



114 Small Sample or t Technique 

(and this assurance can seldom be attained), adopt a more stringent 
level of significance than they would adopt if they had large 
samples Admittedly, a more stnngent criterion of significance 
means that the null hypothesis may be less frequently rejected 
and consequently that a real difference may be overlooked. At 
this point some readers may need to be reminded that the best 
way to avoid committing type II errors is to avoid the use of 
small samples the greater the number of cases the greater the 
likelihood of detecting a difference 

An illustration of the fact that small samples are not conducive 
to rejection of the null hypothesis unless the difference between 
universe values is sizable may be in order Let us suppose that 
the means for the heights of 2 populations are 64 5 and 68 0 and 
that the universe standard deviations are both equal to 2 7. An 
investigator who does not know these facts draws a random sam- 
ple of 8 cases from each universe; and in order to help him a little 
(and also simplify this discussion), we tell him that each orpap = 
2 7 The sta ndard error of the difference between means becomes 
2 7V| -1- I or 1 35. If the investigator accepts the 01 level of 
significance, it is immediately apparent that an obtained difference 
would have to be at least (2 58) (1 35), or 3 48, for him to reject 
the null hypothesis (Why are we justified in using the normal 
deviate, 2 58, with such small samples*^) A little consideration of 
the fact that the sampling distribution of differences between 
means will center at 3 5 indicates that the chances are nearly 
50-50 that the investigator will be accepting the null hypothesis 
even though the real difference is more than a standard deviation 
in magnitude. 

There are times when an investigator may be so anxious to 
accept the null hypothesis that he will seize upon a very high 
level of sigmficance in order to better his chances for accepting 
the hypothesis of no difference Another way for increasing the 
odds in favor of accepting the null hypothesis is to use exceedingly 
small samples Now those who desire to claim that no difference 
exists must face the simple fact that such a proposition can never 
be proved on a sampling basis. The most convincing way to 
demonstrate that a difference is of no practical or scientific impor- 
tance is to use large samples and the confidence interval method 
for specifying limits for the population difference 



CHAPTER 8 


Correlation: Introduction and Computation 


One of the chief tasks of a science is the analysis of the inter- 
relations of the vanables with which it deals. In the physical 
sciences, and frequently in the biological sciences, the interrela- 
tions can be determined by noting how much of a change m one 
variable is associated with change m another. The physicist 
studying the relationship between pressure exerted by a gas and 
temperature can vary the latter at will so as to determine the 
pressure at different temperatures In the social sciences, and 
sometimes in the biological sciences, the vanables studied are apt 
to be characteristics of individuals (plant or animal); thus to 
study relationships the expeiimenter is compelled to make meas- 
urements on seveial individuals For example, if 2 variables 
such as height and weight are under consideration, the measured 
height and weight of N individuals will provide N pairs of observa- 
tions from which it can be determmed whether the 2 vary together. 
In either case it is important to determine the form (mathematical) 
of the relationship and the accuracy with which one can make pre- 
dictions 

Many relationships are expiessible m terms of the simplest of 
all mathematical forms, Y = A + BXj in which Zand Y represent 
variables and A and B are constants determinable from the ob- 
servations. The accuracy of prediction can be deteimined, and it 
IS convenient that we have some general measure of this accuracy. 
One such measure which can be computed and which will yield 
inf ormation as, to the degree of accuracy and i h.e degree of relaho nr- 
- ehip ^ the correlation. coefficimiL designatgjJjL. This measure of 
co::^elation, as we shall soon see, not only tells us the degree of 
relationship, but will also, in conjunction with the 2 means and 

115 



116 Correlation: Introduction and Computation 


standard deviations, permit us to wnte the linear equation for 
p^gdicting Y from X or X from Y 
Our present discussion will be concerned with the determination 
of relationship between such typical variables as height, weight, 
strength, age, intelligence, social status, attitudes — e., with those 
variables which show variation from individual to individual 
The question of the relationship between variables of this tjrpe 
can be stated quite simply Is there a tendency for the individual 
who lanks high (or low) on one characteristic to be high (or low) 
on another also^ It should be noted that at times a relationship 
may involve just 1 variable: Are heights of sons related to the 
heights of their fathers? Are the IQ*s of adults related to their 
childhood IQ’s? 


THE SCATTER DIAGRAM 

The first task is that of tabulation. If we have observations 
on the height and weight of a large number of individuals, using 
cross-sectional or coordinate paper, we can lay off on the y axis 
convenient tabulating intervals for, say, height and on the x axis 
mtervals for weight. The rules for choosing intervals stated on 
p. 6 should be followed here. Tabulation then consists first of 
finding on the y axis the interval in which an individual’s height 
falls and locating the interval on the x axis for his weight. A tally 
or dot is then placed m the cell formed by the intersection of these 
2 intervals The result of such a two-way or cross tabulation is 
referred to as a scatter diagram ov oorrelation table It will contain 
as many tallies as there are pairs of observations The tallies in 
each row, or honzontal array, can be counted and recorded, sepa- 
rately by rows, to the right of the diagram. This procedure will, 
of course, yield the frequency distribution for all individuals with 
respect to the variable on the y axis. A similar count, and re- 
cording at the top, of talhes for each column, or vertical array, 
will yield the distribution for the other variable. The sum of the 
frequencies for either of these marginal distributions should equal 
N , or the number of pairs of observations 

Figures 12a and 12b are illustrative scatter diagrams, but not 
models so far as number of grouping intervals is concerned. In 
practice, from 12 to 20 mtervals should be used in order to reduce 
the groupmg error to a negligible amount. It is to be understood 










118 Correlation: Introduction and Computation 


that the intervals in these charts are 40-44, 30-39, 50-59, etc The 
stiftdent should study these diagrams so as to grasp some of the 
mechanical details mvolved in their construction It should be 
noted that the number and size of the intervals for the 2 variables 
need not be the same, and that the zero points on the scales of 
measurement need not appear or even be indicated on the axes 

It can readily be seen that these 2 diagrams represent different 
degrees of relationship A precise method for measuring or de- 
scnbing degree of relationship or association or correlation will be 
discussed in detail in the pages to follow. We shall begin i,vith a 
symbolic definition of a basic correlation coefficient, indicate its 
computation, and then discuss its meaning, interpretation, as- 
sumptions, and finally its limitations Certain elementary mathe- 
matical denvations will be either indicated or given whenever it is 
thought that their inclusion will be useful in clanfying a point or 
clinching an assumption 

The l^arsqn 'product moment correlation coefficient is defined by 


2x1/ 

r = 

Ncrx<Ty 


(29) 


in which x and y represent deviation measures from the respective 
means of the 2 variables, i e , a. = Z — Mx and y ^ Y -- My^ 
the sigmas in the denominator are the standard deviations of the 
2 distributions, and N is the number of individuals measured 
With reference to a scatter diagram, Mx and ax hold for the margi- 
nal distribution at the top, whereas My and ay hold for the distn- 
bution to the right The numerator term, Sxy, implies that the 
product of each individual's x and y is determined, and that all 
such products are summed algebraically. There will, of course, 
be N products in this sum, some of which will be positive, some 
negative, and perhaps some zero. 

Defimtion formula ^9} is seldom used for computation . For N 
small a usable computational equivalent is 


r = 


mxY - sxgz 


- (s^ Vi*^' - 


(30) 


which involves 4 f amiliar sums , and the hiitti of tlia nroducts o f the 
paired r aw scores This fomuila is unwieldy for large N and/or 



Calculation of r 


119 


scores which are numerically large For reasons which will become 
apparent later, the careful researcher wiU always make a scatter 
diagram, and once this has been done it is economical to. compute 
r in terms of step-interval deviations from arbitrary origins. An 
appropriate formula is 


Nl^dasdy — 'Sdx'Zdy 

'iVSd"» - (Sd«)*VJVSd% - (Sd,)2 


in which d® is defined as an individuaPs score deviation, in step 
intervals, from an arbitrary origin on the X scale, and dy is defined 
similarly for the Y scale The student will note the similarity of 
the radical terms to formula (5) for computing <r. Formula (31) 
calls for 2 sums, 2 sums of squares, and a sum of cross products, 
aU in terms of step or interval deviations from arbitrary ongins. 
The arbitrary ongins may be taken at the center or at the bottom 
of each distnbution The former will mvolve handling smaller 
figures but will have the disadvantage of introducing negative 
numbers The latter scheme is better if a calculating machme is 
available. 

CALCULATION OF r 

The computation of r will be illustrated for both hand and 
machine calculating methods The hand calculation scheme here 
used may not be quite as economical as other available schemes, 
but the particular setup has the advantage that it forms an eco- 
nomical basis for machme computation, and the author presumes 
that practically all those who are apt to compute more than a 
few r's will have access to a calculating machme of the Monroe 
or Marchant or Fnden type. Once the steps mvolved m the hand 
calculation form are grasped, it becomes easy to transfer them to 
machme work The writer has never found the commercial corre- 
lation charts helpful. All one needs is a sheet of cross-section 
paper ruled 4 Imes to the inch, on which one can readily lay out 
the axes, in intervals, for tabulating or tallying. When the scatter 
diagram has been made and the tally (or dot) marks have been 
summed across and up to get the marginal frequencies (as shown 
m Figs. 12a and 12&), th^.d values, taken from an arbitrary origin 
at the bottom-most interval for each variable, can be written, 
preferably with colored lead, alongside the marginal frequencies 



120 Corrdation: Introduction and Computation 


(see Table 10). The columns of fd and f<P values along each 
Tnnrgm can be obtamed by multiplying in exactly the same 
mg.T>TiP!r as was previously done for calculating the standard devia- 
tion The sums of these columns provide 4 of the 5 sums needed 
for r 


Table 10 * CoMPi t\tio\ of r 


€ 

< 

o 

o 

o 

CM 

la 

xO 

§ 

0% 

o 

tH 

11 S 33 99 

i 

LQ 

CO 

1—4 

ta 

o 

09 

o 

00 

3 7 21 147 

09 

o 

1-4 

A 

w 

w 

dy 

fdy 

d, 

sums 

dvdx 

140 






1 

1 

1 

3 

8 

24 

192 

18 

144 

130 





1 

1 

3 

1 

6 

7 

42 

294 

34 

238 

120 




1 

2 

2 

1 

1 

7 

6 

42 

252 

34 

204 

no 



1 

2 

3 

2 

2 


10 

5 

50 

250 

42 

210 

100 


1 

2 

3 

4 

2 

1 


12 

4 

48 

192 

45 

180 

90 


1 

2 

3 

2 

1 



9 

3 

27 

81 

27 

81 

80 


2 

3 

2 

1 




8 

B 

16 

32 

18 

36 

70 

1 

2 

1 






4 

1 

4 

4 

4 

4 

60 

1 


1 






2 

0 

0 

0 

2 

0 


25 

30 

35 

40 

45 

50 

55 

60 

61 


253 

1297 

224 

1097 






r 




AT 




nir 


N Sd, Sd*„ Ck 


(61)(1097) - (224)(253) 

'{61)(1012) - (224)V(61)(1297) - (253)* 


Space himtations account fot the use of too few mtervals m this table 
A complete labehng of intervals would be 25-29, etc , and 60-69, etc 

In order to obtain each mdividual’s d* must be multiplied 
by his d„, and aU such products then summed In the 140 interval 
on the y axis we find 1 mdividual whose score on the X variable 
falls in the 50 mterval on the x axis. In terms of step deviations 
his dy value is 8 and his dy value is 5, and therefore 5 times 8, or 
40, represents his d^d^ product. Another individual with the 
same d„ value has a dx value of 6, whence 6 times 8 is his contii- 



Calculation of r 


121 


bution to Sdajdj, The third individual m the 140 interval has a 
dx value of 7, whence 7 tunes 8 is his product These 3 individ- 
uals contribute 5X8 + 6X8 + 7X8, or 144, to the sum of 
products. The dy value of 8 is a common factor to these 3 prod- 
ucts, whence 8(5 + 6 + 7) or 8 X 18 yields 144. This sug- 
gests a scheme, for computing the dady sum, which involves first 
summing the dx values for a paiticular Y mterval or array and 
then multiplying this sum by the dy value Thus the dx values 
of the individuals m the 130 interval sum to 34, and in the 120 
interval to 34, and so on down to the 60 mterval, which yields 2 
as the sum of the dx values The determination of these dx sums 
is greatly facilitated by the use of a runner on which the dx values 
0, 1, 2, 3, • • - , have been labeled to correspond exactly with the 
deviations in step mtervals alongside the marginal distribution 
at the top of the diagram Since each of these dx sums is to be 
multiplied by a dy value and then all the products summed, it is 
convenient first to recoid the dx sums to the right as a separate 
column and then to multiply each dx sum by the correspondmg 
dy value, thus leading to the last column of figures Before these 
fiinfl.1 multiplications are made, the column of dx sums should be 
added to see whether it agrees with the Sd® already computed 
from the marginal distribution of X scores. Thus an internal 
check IS provided for the column of dx sums, all other computa- 
tions should be done twice m order to insure accuracy 

When a cS^atpnis^ysuJUible the work sheet need not mclude 
the /d^and fd^ columns, smce the sums of these 2 columns can 
readily be obtained by the method discussed on pp 23-24 This 
means that the column of dx sums can be placed alongside the dy 
values; then each d* sum can be multiphed by the juxtaposed dy 
value, with the products allowed to accumulate m the dial as the 
needed Sd^dy. Thus the right-hand column figures need not 
appear on the work sheet 

The substitution of the 5 sums mto formula (31) is straight- 
forward. The denominator factors are evaluated as explained 
on p. 24, and the numerator is obtamed by punching Sda<iy mto 
the keyboard and multiplymg by N; then, with the product left 
in the lower dial, Sd® is subtracted Sdy times It needed, the 2 
mftflTiH can be obtamed by substitutmg Zd* and 2dy into (3), 
^T^e 2 Jtan^d deviations by multiplying the proper radical 
by the interval size and dmdmg by [equivalent to substituting 
the Slim and sum of squares into (5)] 



CHAPTER 


Correlation: Interpretations and Assumptions 


Intelligent use of the correlation coefficient and critical under- 
standing of its use by others are impossible without knowledge 
of its properties. It is not sufficient that we be able merely to 
recognize r as a measure of relationship It is a peculiar kind of 
measure which peimits certain interpretations piovided certain 
assumptions aie tenable and provided one considers possible 
disturbing factors Since the interpretations of r are so closely 
related to assumptions, no attempt will be made to present a 
separate discussion of these 2 aspects The factors which affect 
r, and which are therefore limitations additional to assumptions, 
will be discussed in Chapter 10. 

STUDY OF SCATTERGRAM 

We shall begin by making a somewhat detailed study of certain 
properties of a typical scatter diagram. The colunms and rows 
of the diagram have already been referred to as vertical and 
horizontal arrays, the mtersection of 2 arrays has been called a 
cell, and the meaning of the marginal distributions has been given. 
If the scatter diagram depicted in Table 11 is examined, it will be 
noted that each vertical (and also each horizontal) array contains 
a frequency distnbution, and that the marginal totals really 
represent the number of cases in these array distributions These 
array distributions are very much like any other typical distnbu- 
tion. bell-shaped with a clustenng or scattering about a central 
value. The mean and standard deviation again become useful 
descriptive terms Thus, in Table 11, the mean height of sons 
whose fathers were 64 mches tall is found to be 66 8 inches. This 
IS simply the mean of the 12 cases which fall m this particular 
array. Similarly for all the vertical arrays we have the means as 

122 



123 


Study of Scattergram 

recorded along the bottom of Table 11 The means of the hori- 
zontal array distributions have been recorded to the right of the 
scatter diagram For example, the mean height ol the 10 fathers 
whose sons weie 72 inches tall is 70 0 inches 


Table 11 Correlation Table for Height op Fathers (*Y) and Height 

OP Sons (F) 



2 

6 

12 

19 

27 

26 

26 

26 

20 

15 

8 

5 1 

Ma 

75 










1 



1 

710 

74 







B 

■ 

B 

B 


1 

2 

72 0 

73 







B 

B 

B 

B 

1 

1 

5 

70 6 

72 






1 

B 

2 



1 

1 

10 

70 0 

71 




1 

2 

2 

2 

3 

4 

2 

2 

1 

19 

69 1 

70 



1 

1 

4 

2 

4 

4 

4 

3 

1 

1 

25 

68 5 

69 


1 

1 

3 

4 

3 

5 

6 

4 

2 

1 


30 

67 8 

68 


1 

2 

2 

5 

8 



3 

2 

2 


34 

67 7 

67 

1 

1 

3 

B 


5 

4 

2 

3 

1 



29 

66 7 

66 


1 

2 

2 

2 

4 


B 

B 




15 

66 3 

65 


1 

2 

3 

2 

2 

1 

I 

1 

B 

B 


12 

658 

64 

1 

1 

1 

2 

2 

1 

i 

B 

B 

B 

B 


8 

647 

63 




1 

1 








2 

65 5 


62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 


Ma 

65 5 

66 5 

66 8 

66 8 

67 6 

67 8 

68 6 

691 

69 5 

70 6 

70 3 

72 0 



r - 56 N- = 192 

Af * - 67 69 - 2 49 

Ary»6844 o-v » 2 33 


If the means of the vertical arrays are plotted (see crosses in 
Fig. 13) two things will be noticed: the means are progressively 
greater as we pass from short to tall fathers, and they fall approxi- 













































124 Correlation: Interpretationb and Assumptions 

mately ou a staraight line. It ■will be noted (see dots in Fig. 13) 
that the for the horizontal arrays also approximate a line 

and show progression. Now, ■with reference to the means of the 
vertical arrays, each represents the mean height of sons of fathers 
of a particular height and therefoie may be used as a baas foi 
predictii^ the height, if unknown, of a man if we have been told 


75 













74 











# 


73 










• 



72 






. 



• 



X 

71 









mmm 


70 







i 



X 


69 







V 






68 






K 

r 

• 






67 












66 





w 








65 





r.“^ 


! 






64 




• 









63 





> 









62 

63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 


Fig IS Plot of array means for data of Table H. 


the hdght of his father Thus, if the father is 66 mches tall, the 
best estimate of his son’s he^t is 67 6, the observed mean height 
of men whose fathers are 66 mches in height. 

Obviouslpr such an estimation would be subject to considerable 
error, smce we have also the observable fact that the heights of 
sons of fathers 66 inches taU show a large amount of variation 
about the array average. This ■variation tells us something about 
the possible mag^iitude of the error mvolved in using 67.6, ■the 
array mean, as our estimated ■value. The unknown height, of 



125 


Study of Scattergram 

which, we take an array mean as an estimate, may actually fall 
anywhere within rather wide limits on either side of the airg-y 
mean These limits can be described m terms of the standard 
deviation of the array distribution, i e , the error of estimate can 
be stated in terms of a <t. The standard deviation for the distnbu- 
tion of heights of sons whose fathers were 66 mches in height is 
about 2.1. Now, it we take 67.6 as the best estimate, we can 
say that, if we were to predict the height of 100 sons (fathers 
66 inches), about 68 per cent of the time the error would be within 
the hmits 67.6 rfc 2 1, 95 per cent within 67.6 ±4 2, and nearly 
always within the limits 67 6 ± 6 3 Likeivise, when the sigmas 
for the several arrays have been computed, a statement of the 
limits of the error in predicting any son’s height from his father’s 
height can be made. Such a procedure will 3 ueld as many measures 
of error as there are vertical arrays. We shall soon see that a 
convenient assumption can be made which will usually allow us to 
use a single indication of the error of estimate. 

Let us return again to the line of the means. Two such lines 
have been drawn m Fig. 13; one hne ‘‘fits” the means of the verti- 
cal, the other the means of the horizontal, arrays Let us for the 
present confine our attention to the means of the vertical arrays. 
They do not lie exactly on the drawn Hne; some are above, some 
below. If they fell exactly on the line, a prediction based on an 
array mean would be precisely the same as a prediction obtamed 
by noting the Y value of the hne where it cuts the middle of the 
array. Furthermore, if the means were exactly on a straight line, 
we might write the equation for this line m the form Y = BX + A, 
where A equals the y intercept (value of Y where hne crosses the 
y axis) and B equals the slope of the Ime (the inclination of the 
line to the x axis) With A and B known, the value of Y for a 
particular X can be readily estimated. 

But, since the means do not lie exactly on a straight line, the 
above reasoning would not seem offhand to yield us anything of 
practical value From many viewpoints, however, it is desirable 
that we determme the equation of the straight hne which best 
''fits” the means, i e., the equation of a line which passes near all 
the means. Then we can use this equation instead of the array 
means in making predictions. The justification for this procedure 
depends upon the validity or tenabihty of an assumption: we 
assume that the failure of the means to fall exactly on a straight 



126 Correlation: Interpretations and Assumptions 

line is due to chance fluctuations in the means. Each array mean 
is jDased on a sample and consequently deviates more or less from 
the true or population value of the mean for the array. This is 
equivalent to saying that, if all the array means were based on a 
much larger number of cases, we could assume that they would 
approximate more exactly a straight line. This is an assumption 
which can always be made provided the array means for a par- 
ticular scatter do not show marked deviations from linear fonn. 
(Adequate checks in terms of probability, to be descnbed later, 
can be utilized to ascertain whether the fluctuations from linearity 
are larger than is reasonable on the basis of chance ) 

THE BEST-FIT LINE 

We can now consider one of the advantages of using a line 
instead of the several array means as a basis for prediction The 
location of the line is dependent upon all the means, or rather 
upon all the cases. It therefore seems reasonable to believe that 
the line would be more stable from the sampling viewpoint than 
would the array means, each of vrhich is based on a rather small 
number of cases. 

If we accept the assumption of linearity of array means, our 
problem is that of determining A and B so that vre can write the 
equation of the Ime of means. We need the equations of two 
hues: Y = BX + A for the means of the vertical arrays and 
X = JB'7 •+■ A' for the horizontal array means. We shall con- 
sider the determination of the constants A and B for the first 
equation, but before doing so something must be said concermng 
what is meant by a “best-fit’’ line. The constant A gives the y 
intercept, i e , teUs us where the line cuts the y axis Suppose we 
think of several possible lines having the same slope (the same 
B) as the line in Fig. 13 which passes near the crosses. Obviously, 
if we considered a line passing near the top or bottom of the scatter 
diagram, it would be a “worse fit” than that drawn in Fig. 13 
Likewise, if we think of pivoting the line about some point, thereby 
altering its dope, it can be readily seen that rotating it to a vertical 
or horizontal position would give a worse fit. It should now be 
dear that the assigning of some values to A and B will lead to a 
worse fit than that obtamed by certain other values, or conversely 
that some values will yield a much better fit than others. 



The Best-fit line 


127 


One criterion which is accepted as a basis for a best-fit line is 
that the sum of the squares of the deviations from the line sh^ 
be as small as possible With respect to determining the best-fit 
line to the means of the vertical arrays, this criterion or definition 
of fit implies that the values of A and B are to be such that the 
sum of the squared deviations of the observed heights of sons — 
deviations in an up and down or vertical direction — about the 
line will be a minimum. Stated in symbols, let 7' = BX + ii, 
where 7' (read 7 prime) is the value estimated from a given X, 
and let 7 be the observed value Then (7 — 7')^ represents 
the squared deviation of any 7 from the line or estimated value. 
The problem is so to choose A and B as to make S(7 — 7')^ as 
small as possible. It is more convement to deal with both the 
equation, y' ==bx + a, and the sum, 2(y — y')^, m deviation 
units, with t/' and y as deviations from My and a; = X — Mas. 
This is merely the translation of the axes which makes the origin 
or reference pomt coincide with Mx and My, The student should 
visuahze the meamng of this shift of axes Note that the pattern 
of tallies is not changed by this simple transformation. Do you 
think that the slope B will equal the slope 6? Will A — Let 
us keep the first question in abeyance and examme now the sec- 
ond question. Both A and a represent the y intercepts of the 
desired prediction line. If it is not immediately obvious to the 
student that A may not equal a, he should imagme that in Table 11 
and Fig 13 the axes have been moved so that the origin is at the 
center of the scatter diagram, and then ask himself where the 
Ime through the means of the vertical arrays would cut the new 
y axis (Incidentally, it should be noted that the value of A 
cannot be read directly from Fig 13 for the simple reason that the 
reference frame as drawn does not include the ori^. The real 
y and x axes of the onginal measures would be, respectively, to 
the left of, and lower than, the mdicated axes.) 

It IS of mterest to speculate concerning the value of a in the 
equaticm y^ + a Common sense would suggest that, if an 
individual were average on X, the best guess would be that he 
would be average on 7. That is, if X = M*, one would eiqiect 
7' to equal My But, if an individual's X measure fell at Mx, his 
deviation, or x value, would be 0, and the estimated value of 7 
as being equal to My would in terms of deviation scores become 0 
This would imply that the prediction line would pass through 



128 Correlation; Interpretations and Assumptions 

the o rign of the deviation score reference axes, and consequently 
that the y intercept would be zero; hence a = 0 For the purpose 
of fflmpUfvmg the determination of the best value for 6, we ask 
the reader to accept, on the basis of the above reasoning, that 
0 = 0 for the best-fitting line If we carried both a and h along 
in the following development, a would in fact turn out to be zero. 

This permits us to write j/' = 6* as the equation for estimating 
y, in deviation umts, from i, or deviation values of X. Our task 
becomes that of determining the value of 6 which will TnaTra 
S(y — p')* a TniniTniim - Inddentally, it tiiould be obvious that 
the discrepancy of any particular y value from the desired line 
has the same numerical value as the de\'iation of its corresponding 
original Y value from the line, and that S(y — 2/')® = S(y — Y')^. 
When we have determined the optimal value for 6 in j/' = lx, 
we can readily pass back to the original reference frames, the 
gross score axes, by substituting for j/' the value Y' — My, and 
for X, X — Mg. With a fixed as zero, i.e., with the y intercept 
equal to zero, we can think of the line as passmg through the 
oii^ (deviation axes); ie., its up and down location is fixed 
Obviously, many lines could be drawn through the ongm, and 
they would differ only as to slope, i.e , as to 6. Of all possible 
lines which may be drawn through the origm, some will be closer 
than others to the observations (tallies) in toto. One might imagine 
several lines any of which would seem to constitute a good fit 
As one tidces lines with either greater or lesser slope than those of 
apparently good fit, the fits wiU become worse; and of those which 
seem to fi.t, some will actually be better than others. The student 
might think that it would only be necessary to draw what seems 
by inspection to be the best-fitting line, and then obtain its tiope 
by actually measuring the angle which it makes with the horizontal 
(with needed adjustment to allow for the measurement umts). 
The trouble with this procedure is that individuals would tend 
to disagree regarding whidh of several lines was really best; also, 
the measurement of angles would be none too exact. What we 
need is a procedure which is objective, a method that will yield 
the value of I which leads to the best possible fit m the sense of 
reduting the sum of the squares of the discrepances to a TniniTniun . 

We set up the function 

, 2(y - y'f S(y - bx)^ 


N 


N 



The Best-fit Line 


129 


in which we have N deviations of the form y -- y' or y '-•hx (since 
2 /' = bx). These deviations when squared, summed, and divided 
by N give us a quantity or function which is to be minimized* by 
the proper choice of 6 The value to be assigned to b can best be 
ascertained by the calculus.* This is done by taking the deriva- 
tive of the function with respect to b, setting this derivative equal 
to zero, and then solving for 6. Thus 

df —2Xz(y — bx) 

lb N 


which, set equal to zero and divided by —2, gives 


or 


then 


'I,x{y — bx) 

N 

^xy — Vhoi? 

N 


^xy ^x^ 

— ^6 

N N 


= 0 


The first or cross-product term involves the correlation coefl&cient 
as defined by formula (29), from which definition formula we see 
that 'Zxy/N = ro-xo-y, and smce 2a^/N = we have 


or 

which gives 


bo^ X 


rcTy — bffx = 0 


6 


(Ty 

r — 


<rx 


as the optimal value for 6 We therefore have 


2 /' 



ax 


(32) 


as the equation for the best-fit line. This equation is in. terms of 

The student who has not studied the calculus will either take the first 
part of the following derivation on faith or, if skeptical, will dig into a calculus 
text to satisfy himself that no magic is mvolved here 



130 Correlation: Interpretations and Assumptions 
deviation measures, and by proper substitution we get 

Y'-M„ = r-iX - Jf,) 

CTx 

or 

Y' = r^X-\-(M„- r ^ itf,) (32a) 

CTa; \ Cz f 

as the equation in terms of the onginal or gross scores. This is 
the form which we would use in predicting Y from X. Note that 
B == r((Ty/(rs:) is the slope of this line and that the constant A 
is equal to the parentheses term. 

By similar reasomng the equation of the best-fit line to the 
means f of the horizontal arrays is found to be 

(33) 

(Ty 

which becomes 

X' ^r-Y + fM^-r-M^] (33o) 

CTy \ (Fy / 

Regression* Equations (32) and (33) in deviation score form 
and (32o) and (33a) in gross score form are known as regression 
equations, and the constants denoting slope are known as regression 
coefficients. It is assumed that prediction will be as accurate by 
means of a regression equation as by way of array means, and it 
can readily be seen that by using a regression equation one can 
predict from intermediate values, e.g, 64§ This is of especial 
advantage with grouped data: the array mean is associated only 
with the midpoint value of the grouping interval, whereas the 
regression line is not so limited since it is continuous 

Rate of change. The results of the foregoing derivation make 
it clear that the correlation coefficient, along with the two means 
and the two standard deviations, enables us to write the equation 
by which either variable can be predicted from the other The 
regression coeflS.cients indicate the rate of change — ^unit of change 
in one variable per unit of change m the other — and in case the 
two standbard deviations are equal, r itself indicates the rate of 

t More stnctly spealdng, we are fitting a bne to means weighted according 
to their respective i e , we are fittmg a Ime to the observations 



Error of Estimate 131 

change. Thus we have one of the possible interpretations of the 
correlation coefficient ^ 

For the coi relation table in Table 11 we get, by proper substitu- 
tion, the following as the regression equations. 

F' = 52X + 33 24 (to estimate son’s height) 

X' = 607 + 26 63 (to estimate father’s height) 

The student should study Fig 13 sufficiently to convmce himself 
that 52 is the slope of the line passmg near the crosses, and that 
60 represents the slope (with reference to the vertical) of the line 
through the dots. The student should also satisfy himself that 
the constants 33 24 and 26.63 really represent the points at which 
the two lines intercept the y and x axes. Fmally he should show 
that, if a father’s height is at the mean of all fathers, the mean 
of the heights of all the sons is the best estimate of his son’s height. 

ACCURACY OF PREDICTION 

The next problem to which we turn is concerned with the accu- 
racy of prediction by means of a regression equation It has 
already been indicated that, when the mean of an array is used in 
prediction, the error of estimate is a function of the spread within 
that array By introducing an assumption it becomes possible 
to substitute one measure of error in place of the several, numer- 
ically different, array standard deviations An exammation of 
the array distributions in Table 11 reveals that the vertical arrays 
differ from each other very little m dispersion (likewise, the hori- 
zontal arrays). If we were to compute the standard deviations 
for the vertical arrays, we would find differences, for this diagram, 
of such size as could readily be attributed to chance or sampling 
fluctuations; i.e., we assume tha t, if we had a much larger N, the 
array dispersions would be very nearly equal. Ordinarily this 
assumptioii of h^moscedas^^ffitv can be met, and one measure of 
dispersion can be used for aU the vertical arrays (and another for 
all the horizontal arrays). 

Error of estimate. One such measure might be an average of 
the array <r’s, but to determine this we would need first to compute 
aU the <r’s, a somewhat laborious job. Since we are to use the 
regression line, instead of array means, as a basis for prediction, 



132 Correlation: Interpretationb and Assumptions 


we reaJly need something corresponding to the <r about this line 
Sueh a value can be obtained by noting that y — y" (ot Y — T') 
represents the discrepancy between estimated and observed values 
and that S(y — y')^/N is the mean of the squared deviations, 
the root of which will be the standaid dc\aation of the discrep- 
ancies between estimated and obseiv’ed values. This will be 
taken as the one standard deviation to replace the several stand- 
ard deviations as our measure of the error of prediction. This 
particular standard deviation, defined as the square root of 
S(y — is called the standard error of estimate. It may be 

determined in two ways. First we can take a roundabout way 
which involves these steps: the prediction of each Y by use of 
equation (32a), or each y by use of (32); the calculation of the 
discrepancies (Y — Y') or (y — y'); squaring, summing, dividing 
by N, and taking the square root A quicker method for detei^ 
min i n g the standard error of estimate is readily derived alge- 
braically. 

Let ffy s stand for the standard ciror of T as estimated from 
X; then by definition, 

^ S(F - Y')^ S(y - y'f 

O’ 2/ as ““ — ' 

jyr N 

but 


by formula (32) whence 


y' = 


r — 


X 





then 



= - r^a^y 


<Ty X — 1 — 



Error of Estimate 


133 


By a similar line of reasoning it can be shown that 

<rxv = <r,Vl - (35) 


which gives the standard error of X as estimated from Y. 

Thus the correlation coefficient not only enters into the predict 
tion equations (32 to 33a), but also permits us to gauge the accu- 
racy of prediction. It should be noted in passing that one can 
write the equation of a best-fit Ime without first deternaining r 
and that the error of prediction can also be ascertained without 
recourse to r Such a method for determimng the error of estimate 
has already been indicated: the square root of S(F — Y')^/N, 
in which 7 — F' represents the computed discrepancy between 
observed and predicted values This need not mvolve r unless 
th e prediction equation is written m terms of r, as was done in 
(32a) The equation F' = A -f BX can be written in the form 


, SX^SF - 2XSXF mXY - SXSF 
^ - (SZ)* N1.X^ - (SX)2 ® 

/a-iatCA.'" .?L < -:i2 — , 

in which X and F stand for gross or origmal measures. Formula 
(36) for the best-fittmg line (least squares solution) does not 
involve means, (t's, or the correlation coefficient. If, as is fre- 
quently the case, one is mterested in obtaining the equation for 
F only, it will be noticed that it is unnecessary to compute the 
sum of the F squares, which is not, however, a tremendous savmg 
of time. Perhaps the quickest way for determining the equation 
is by direct substitution into (36), but the determination of the 
error of estimate (sometimes called the closeness of fit of the line) 
IS certainly facilitated by calculating r and cy and substituting 
in (34). 

The standard error of estimate is to be mterpreted as a standard 
deviation, and in so doing we are tacitly assuming that the array 
distributions are not only equal in dispersion but also normal 
For the correlation diagram in Table 11, we have o-j, » == 1-9, 
which IS to be considered the standard deviation of tKeT values 
about the regression hne, F^= 33 24 By use of this 

equation we would predict that the height of the son of a man 
70 inches tall (X = 70) would be 69 6, and the error of estimate, 
1 9, would be interpreted by saying that, if we made many such 
predictions, 68.26 times out of a hundred the actual height of sons 



134 Correlation: Interpretations and Assumptions 

of 70-inch fathers would be within the limits 69.6 rfc 1.9, and nearly 
always within the limits 69 6 d= 3(1.9) 

This IS a second method for interpieting the correlation coeffi 
cient m terms of the accuracy ot prediction or closeness of fit o 
regresaon lines. If no correlation exists, the errors of estimati 
are ary x and ax-y = Idl this connection it can be seei 
from formulas (32a) and (33a) that, when r = 0, the estimatec 
7, 7', becomes My, and X' becomes Mx- For example, if it ha 
been established that the correlation between toe length and I( 
is zero, we would always take 100 (the mean) as our best gues 
for an individuars IQ regardless of toe length. The error of esti 
mate would of course be the standard deviation of the distributioi 
of IQ's, and it would be said that toe length is useless in predict 
ing IQ. The scatter diagram for IQ as 7 and toe length as } 
would exhibit the following characteristics: first, the regressio 
Ime 7' = A -f BX would be horizontal, i e., B would equal zerc 
and the means of the arrays would fluctuate about the valu 
My, or A would equal My, and, second, all the array distribution 
would have dispersions approximately equal to a-y. What woul 
be the best guess as to the other regression line and the standar 
deviations of the horizontal arrays? 

Now suppose the correlation between the variables were perfec 
(r = 4-1 or —1). The tallies in the scatter diagram would li 
in a line, there would be no spreading about this line, the tw 
regression lines would comcide, and no error would be involve 
in estimating X from 7 or 7 from X That o-y x and o-* y woul 
both be zero in case of perfect correlation is qmte evident whe 
one considers formulas (34) and (36). 

At this point the student should note the difference betwee 
positive and negative correlation. In the case of a positive r, 
high score goes with high and low with low, whereas, for a negi 
tive r, hi^ goes with low and low with high. With referenc 
to the scatter diagram, a negative r typically involves a swan 
of tallies stretching from the upper-left to the lower-nght come 
whereas for a positive r the trend is from lower left to upper rigl 
(this assumes that the axes have been laid off in the convention 
fashion). With reference to the regression equations, a negath 
r yields negative regression coefficients or negative slope for tl 
lines. The student diould be warned that an apparently negati\ 
r may in reahty be positive. Thus, if one variable is a test < 



Alienation 


135 


performajice scored in terms of time (or errors) and the other 
variable is scored in terms of amount done, the scatter diagram 
might show large tune scores as gomg with small amoimts of 
work done, i e , high with low, which might be wrongly taken to 
indicate negative rather than positive correlation. Instead of 
asking whether hi^ goes with high and low with low, it is safer 
to ask whether best goes with best. This rule, however, is diffi- 
cult to apply when we are deahng with the mterrelation of per- 
sonality traits, especially those which do not readily permit of a 
statement as to which is the de^able end of the trait scale. The 
sign of the correlation coefficient in such cases always needs a 
qualifying statement which explicitly tells the direction of the 
relationship between the variables. Obviously, as far as accuracy 
of prediction is concerned, the error is the same for a negative and 
positive r of the same magmtude 
Alienation. To return to the interpretation of the correlation 
coefficient by way of the standard error of estimate, we see that 
the factor in formulas (34) and (35) which involves r is Vl — 

It is the value of this which, when multiphed b y the p roper <r, 
leads to the error of estimate The expression VI — is called 
the coefficient of alienation. If r is zero, its value is 1 and the 
error of estimate is the <r for the variable being estimated. Table 
12 gives the value of the coefficient of alienation for varying values 
of r. The student will do well to fix m mind the trend in this 
table. It will be noted that, compared to a correlation of zero, an 
r of .60 reduces the error of estimate by 20 per cent, whereas an 
r of .30 reduces it by about 5 per cent; that r must be as high as 

Table 12 Values of the Coefficient of Alienation 


r 

Vl 

r 

Vl -r^ 

.00 

1.000 

.60 

.800 

.10 

995 

.70 

.714 

.20 

980 

80 

.600 

.30 

954 

.866 

.500 

.40 

.917 

.90 

.436 

.50 

.866 

.95 

.312 


.866 before the error of estimate is reduced by one-half; and that 
the difference in reduction between an r of 70 and an r of .90 is 
approximately the same as that between .20 and 70. This inter- 



136 Correlation: Interpretations and Assumptions 

pretation of r is most useful and at the same time most disturbing, 
smpe the errors of estimate for r's in the vicimty of .40 to .70, 
values usually found and utilized m predicting success from test 
results, are discouragingly large. 

A somewhat different way of graspmg the meaning of r, as it is 
applied to accuracy of prediction, is to square both sides of formula 
(34) and then solve exphcitly for r. This leads to 

y 

from which it is readily seen that the correlation coefficient depends 
upon the accuracy of prediction relative to the total variance of 
the variable being predicted. 

It might be well at this time to bnng together a few remarks 
concerning the assumptions involved in using and interpreting a 
correlation coefficient in terms of either rate of change or accuracy 
of prediction. When an r is reported, and no evidence to the con- 
trary is given, one has a right to expect that the assumptions of 
linearity of regression and homoscedasticity have been met. The 
interpretation of r as rate of change definitely assumes hnearity, 
and the interpretation in terms of the error of estimate definitely 
assumes both linearity and homoscedasticity In certain special 
cases where' the investigator is mterested only in a one-way pre- 
diction, say Y from X, and there is no likelihood of ever reversing 
to predict X from F, it will suffice if the regression of F on X, 
i.e., for predicting F from X, be hnear and the F or vertical array 
distributions be homoscedastic. The use ot the correlation coeffi- 
cient in prefficting performance from age may be cited as an m- 
stance in which one need not worry about the possible nonlinear 
regression of age on score or the lack of homoscedasticity about 
this regression Ime. 

The student may have observed that no assumptions have been 
made concerning the nature of the margmal distributions; the 
utilization of r does not assume normal distributions for the 
variables being correlated. The use of the standard error of esti- 
mate, however, assumes normality of the array distributions. As 
regards the possible effect of nonnoimal margmal distributions, 
experience shows that nonlinearity, lack of homoscedasticity, or 
nonnbnnaJity of arrays may frequently be associated with skew- 
ness in one or both of the mar&nal distnbutions. 



Variance and Correlation 


137 


Although there are adequate checks for linearity and homosce- 
dasticity, a careful scrutinization of the scatter diagram is usu^ly 
sufficient to warn one of violent departures from these assump- 
tions. Formula (30) and other nonplottmg schemes for computmg 
r give no i n klin g as to whether these assumptions are being vio- 
lated and therefore cannot command the confidence of the careful 
mvestigator. The purpose of a research project might very well 
be the study of the relationship between two vanables, but an 
end result m terms of a correlation coefficient, with no attention 
given to the form of the relationship, is inadequate. 


VARIANCE AND CORRELATION 

A third method of interpreting r is m terms of vanance Before 
discussing this interpretation, we must introduce an important 
theorem concerning the variance of a sum (or difference). Suppose 
that variable W is made up of two parts U and V such that 
W ^ U + V. For example, the score on an arithmetic test might 
consist of two parts: score m addition and score in multiplication 
Obviously, w = u + v, and therefore the variance of the W varia- 
ble is 

” N 

= — S('M + V)^ 

N 

= — + 'St^ + 2Suv) 

N 

= a^u + + 2ru«(7a(r» 

and in case U and V are independent, we have 

+ <^v 

If we are dealing with the difference, W = U 
and for U and V independent, we have 


(37) 
(37a) 

— V, we have 

(38) 



138 Correlation: Interpretations and Assumptions 

which IS identical with (37o). In words, variance of a sum (or 
difference) of two independent variables is equal to the sum of their 
separate variances. Variances are additive, whereas standard 
deviations are not It can be shown that, when U and V are 
distnbuted normally, their sum or difference will also yield a 
normal distribution 

Now, with regard to the third method for mterpreting r, let us 
note that in deviation units an observed y can be thought of as 
made up of 2 independent parts, the part which can be pre- 
dicted from X, namely y', and the residual or unpredictable part, 
(y 2/0 Before going further we must demonstrate that y' and 
(y "" 2/0 ^re really independent The numerator for the correlation 
between y' and {y — yO be expressed as l^y'(y — • y'). But, 

since y' ^ r^x and {y y') — y — r^x,we have 

CTx (Tx 


\ 

- 2/0 = 'S,r — x[y - r — x) 

\ (Tx / 

= r “ 2x2/ 

O'* <r X 

= r “ Nr(Tx<ry — Nf^x 

(Tx <r X 

which is seen to be zero; hence 2 /' and {y — y') are uncorrelated 
We have 2/ = 2/' + ( 2 / *” 2/0 5 whence, by the above vanance 
theorem, 

y = cr^j,/ + V * (39) 

in which <r^y * is the variance of the residuals, (y — y^). If we 
divide both sides of this equation by cr^y, we get 



from which we see that, since the 2 ratios add to unity, either 
one can be interpreted as a proportion (or a percentage by shifting 
the decimal point) Thus the ratiojof to^^j, is the proportion 
of the variance in Y which can be predicted from and the ratio 
of t^y X to (^y represents the proportion of the variation (variance) 
of Y which IS left over or remains or cannot be predicted from X. 



Variance and Correlation 


139 


A little reflection as to the meaning of this residual variance should 
convince the student that we are here dealing with the sajq[ie 
variance which results if we square formula (34), thus 

= <r*„(l — r®) 

which, means that 


V * 




= 1 - 


When we substitute this value mto (39o), we have 


l = ^ + l-r2 
from which it is readily seen that the ratio 



That is, the square of the correlation coeflScient gives the propor- 
tion of the total vari^ce of r whicE is predictable from X, or r 
measures the proportion of the Y variance which can be attnbuted 
t£^ variation m X. The proportion of the variance of Y which is 
due to variables other than X is given by 1 — r^. By shifting 
decimals, we can think of as mdic ating_a percentage, the per- 
centage of variance which has been explained', and*! ~ as the 
percentage of variance due to other causes. It will be noted that 
r^, not r, can be so interpreted. This is true because variances 
are additive, whereas standard deviations are not It should be 
emphasized that as a proportion has to do with variation ex- 
pressed technically as variance 

It is of some mterest to examine the meaning of cr^y/. It is the 
square of the standard deviation of the estimated values, and, 
with reference to the scatter diagiam, <;yf corresponds approxi- 
mately to what we would obtain if we were to compute the stand- 
ard deviation about My of the vertical array means, each weighted 
according to the number of cases in its array. As an exercise, the 
student can prove = <^y»l<^y by determining directly, rather 
than by formula (34), that <r^yt = (Hint: use the deviation 

score form of the regression equation.) 

This third method of interpretmg a correlation coeflBicient 
assumes linearity of the regression lip,e mvolved in predicting V, 
or TEe ' dependent variable, from X as the independent variable; 



140 Correlation; Interpretations and Assumptions 

i.e., the regression of Y on X must be linear. If X were con- 
sidered as the dependent variable, then the interpretation that 
7 ^ indicates the proportion of the variance of X explamed by Y 
would assume linearity for the regression of X on Y. The assump- 
tion of linearity becomes exphcit if one proves directly that 
and it was implied when we used a^y.x in that this 
residual variance was taken about a straight line. This interpre- 
tation does not assume homoscedasticity, nor does it assume 
normality either for the marginal or for the array distributions 

The investigator who is interested in analyzing vanation and 
its posable causes will prefer the interpretation of the correlation 
coeflGicient in terms of variance. The problem is frequently one 
in which an attempt is made to explain vanation in one trait in 
terms of variation of another which is conceived of as being more 
basic. The use of as the percentage of the variance of a trait 
which is predictable by, or attributable to, variation in a second 
variable becomes a valuable tool in the analysis of variation. Of 
course one must use caution in assuming causation of one variable 
by another. Logic, not statistical method, must be invoked to 
determine whether a causal relationdiip exists, and the statistical 
interpretation modified accordingly. Variation in X might cause 
variation in F, or vice versa, or variation m both X and Y might 
be due to the influence of some other variable or variables. 

To illustrate the interpretation of as a percentage, let us sup- 
pose we have the performance of a group of school children on a 
substitution test. Considerable variation in scores will be pres- 
ent, and we may rightfully ask whether a portion of this variation 
IS due to age differences. We can determine the correlation be- 
tween age and performance. Suppose r = .60; this can be inter- 
preted by saying that 36 per cent of the variance in performance 
is due to age differences, and 64 per cent is due to other causes 
Likewise, the variance in crop yield due to variation in rainfall 
can be determmed; or the variance m the height of a group of men 
may be analyzed into two or more parts, one of which might be 
the portion due to variation in the heights of their fathers. 

CORBELATION AND COMMON ELEMENTS 

A fourth possible interpretation of the correlation coeflSicient 
a^uines^ ELat each of the two variables can be thought of as a 
summation of a number of equally potent, equally likely, inde- 



Normal Correlation 


141 


pendent elements, which can be either present or absent Then 
the degree of correlation is a function of the number of elements 
common to the two variables. The general formula is 


n. 


'^X'U 


Vna; + Ucy/ny + Uc 


(40) 


in which equals the number of elements unique to X, riy the 
number unique to F, and Uc the number common to both variables 
If the number of elements in X equals the number in F, r gives 
the proportion of elements common to X and F, if X is deter- 
mined only by elements common to F, while F has additional 
elements, gives the proportion of elements entering into F 
which determine X. There is little, if any, factual basis for believ- 
ing that the assumptions stated above are tenable so far as psycho- 
logical variables are concerned, and therefore the interpretation 
of the correlation coefficient m terms of com^piori^ elements may 
be viewed with scepticism. 


NORMAL CORRELATION 

A fifth interpretation of r is more mathematical but of httle 
practical value We have already seen how a frequency distribu- 
tion and its polygon can be thought of as smooth, conforming 
perhaps to the equation of the normal curve A correlation table 
is a frequency distribution, a picture or graph of which requires a 
third dimension. If we were to replace each tally in a scatter 
diagram by a thin block, there would result something analogous 
to the histogram except that it would be three dimensional — ^the 
heights of the stacks of blocks would indicate the frequencies for 
the various cells Now suppose that this mound of blocks is by 
some method smoothed to a surface, and we consider the total 
volume under the surface (between the surface and the XF plane) 
as representmg N. Then the number of cases fallmg between two 
given X values and simultaneously between two given F values 
will be approximately the volume of that portion of the mound 
which has as its base the rectangle or square formed by the inter- 
sections of the two X and two F values. If th e regression lines 
are linear, if the array distributions are normal aryj homoscedastic, 
aHSriTtlie mar ginal distributions are nonnal, the resulting surface 



142 Correlation: Interpretations and Assumptions 


is termed the normal correlation surface j and the equation of the 
siwface can be wiitten as 

AT 1 ■ 2/g 2rxy\ 

— , — g 2(1 — r®) \o’2j. 9z<fyJ (41^ 

27r<rx<ri,V^l — 

A number of important properties of the normal correlation sur- 
face can be deduced from this equation and its integral. For in- 
stance, the standard error of estimate can be derived from formula 
(41), and it can also be shown that the contour Imes which repre- 
sent different altitudes on the moimd, i e , different frequencies, 
will be concentric ellipses, and that if r = 0, the contour lines will 
become concentiic circles If the equation is written with N equal 
to imity, by double integration the probability of an individuars 
falling between two particular Y values and between two X 
values can be determined Tables are available which can be 
utilized for this purpose % 


LIMITS FOR r 

Attention is called to the fact that defimtion formula (29) be- 
comes r = XzaiZy/N, when written m terms of standard scores 
for both variables This mdicates specifically that the correla- 
tion coefficient is a statistical average, the average of the cross 
products of standard scores. Suppose that we ask what happens 
when the correlation is perfect in the sense that each mdividuaVs 
Zx score equals his Zy score If this is true, the sum ^ZxZy would 
be the same as which when divided by N gives 1 00 Thus 
the upper limit for r is +1 00. Now suppose a perfect inverse 
relationship, such that an individuars z® and Zy are the same 
except for sign, one being positive whereas the other is negative 
If this holds tnie for all the cases, the sum 'SzxZy can be wntten as 
Sz(— z) or — Sz^, which when divided by N gives —1 00 as the 
limit for perfect negative correlation. 

As exercises, the student should show that multiplying or divid- 
ing either X or Y or both by a constant, or X by one constant and 
y by another, will not change r, and that adding or subtractmg a 
constant does not affect the value of r 

t Pearson, Karl, Tables for stahshcians and h(mietricianSj part II, Cam- 
bndge* Cambndge University Press, 1931 See Tables 8 and 9 



Summary 143 

SUMMARY 

The five suggested methods for mterpreting the coirelation 
coeflEicieiit may be briefly summanzed here 

1. r is associated with the rate at which one variable changes 
with another. This assumes that the regression line so interpreted 
is linear. 

2 r tells us how accurately we can predict by a regression equa- 
tion. The standard error of estimate permits one to infer the 
possible magmtude of the prediction error, whereas the coeflSicient 
of alienation indicates the reduction in error over that error which 
would exist if there were no correlation This mterpretation 
assumes that the regression Ime used in predictmg is linear and 
that variation about this Ime is normal and homoscedastic. 

3 gives the proportion of variance in Y predictable from, 
or attributable to, variation in X. This assumes linearity for the 
regression of F on X and reqmres caution in assuming the direc- 
tion of cause and effect 

The student should attempt to visualize the meamng of these 
three principal methods of interpretmg correlation. In particular, 
he should note the meaning of (Ty^ and cry x (or their counter- 
parts with the subscripts y and x interchanged). The first, <r^, 
holds for the margmal distribution of all F's; o-y/ pertams to the 
variability of all Y values as predicted from X) the third, cry », is 
a measure of the variation about the regression line for predictmg 
Y from X. 

4. r or 7^ can be interpreted m terms of the proportion of ele- 
ments common to the two vanables provided we are willing to 
make rather ha^dous and unrealistic assumptions as regards 
the nature of the variables. 

5. r can be interpreted mathematically m terms of the equation 
for the normal correlation surface. This assumes that both regres- 
sions are linear, that homoscedasticity and normahty hold for 
both the horizontal and vertical array distributions, and that 
both mar^al distributions are normal m form. 

The nature of the investigation will usually dictate or suggest 
the appropriate interpretation. Ordmanly the fifth will not be 
used in connection with the application of the correlational method, 
whereas the fourth rests on assumptions w^hich can seldom be met 



CHAPTER 10 

Factors Wliiph Affect the Corrdktioii Coefficient 


Before we interpret, or draw conclusions from, a particular 
correlation coefficient, it is necessary that we ask ourselves, What 
iactors might have affected its magnitude? The size of an o b- 
taiufidj: depends upon several spe cific conditio ns, and, even though 
it is not always essential that corrections be applied, the investi- 
gator must forever be on the lookout for correlations which deviate 
from their “true’’ value because of the operation of disturbers. 
This chapter will be devoted to a discussion of the more common 
factors which influence r. 

It is assumed that errors in computation have not been per- 
mitted — ^that all arithmetical work has been checked. It is also 
assumed that sufficient intervals have been used so as to make 
unnecessary the application of Sheppard’s correction for grouping; 
if more than twelve intervals have been used, the sli^t increase 
in r which results from correcting the standard deviations will be 
negligible. Certain textbooks have advocated a correction to r 
for smallness of the sample, which correction reduces r by a negli- 
gible amount. In view of the magnitude of the effects of other 
factors on r, these two possible corrections seem trifling, 

SELECTION 

One of th e first ques tions which must be faced is: Do the cases 
upon which r is based represent a random sampling of some de- 
fied population, or have selective factors so operated as to in- 
crease or decrease r? The literature of psychology is not free from 
correlation coefficients which are decidedly different from values 
that would have been obtamed had the sampling been random. 
This is not to say that any investigator has willfully selected his 

144 



Sampling Errors 


145 


cases so as to produce correlation, but rather to say that unwitting 
errors are frequently present in spite of an effort to avoid selective 
factors. 

SAMPLING ERRORS 

Even though one feels reasonably sure of the randomness of the 
sample upon which an r is based, it is still necessary to consider 
the obtained r in terms of variable errors due to sampling Any r 
based on N pairs of observations will differ more or less from the 
universe, or population, value, which is heie conceived of as the 
value of the correlation coefficient which we would obtain if we 
had an infinitely large sample. Many of the older texts gave 
(1 — r^)/V^ as the standard error of r, but failed to point out a 
serious limitation as regards interpretation: that this is an approx- 
imation and that r’s for successive samples are not distributed 
normally unless N is large and/or the universe value, is near 
zero. 

Before further discussion it should be said that some measure 
of the samphng fluctuation of the correlation coefficient is highly 
desirable for any of three reasons: (1) We may wish to say whether 
an obtamed r can be taken as representing a real, nonchance, 
correlation, i e , whether it deviates sufficiently far from zero so 
that we cannot regard it as a chance fluctuation from no relation- 
ship; (2) we may wonder whether a given r deviates significantly 
from some a priori or expected value, or (3) we may raise the ques- 
tion of whether two obtained r’s are sigmficantly different from 
each other. The answers to these questions must be m terms of 
probability, and the probability figure which we accept as indi- 
catmg significance determmes the confidence with which we re- 
gard any such conclusions as we set forth. 

If N is greater than 30, and if we are interested in saying whether 
or nox to r tot .50 or less, usually) is significantly different from 
zero we can determine its standard errorJb y 


V JV - 1 


(42) 


and then divide the obtained r bv this standard, err or in order to 
secure an x/cr value with which to enter the nomial probabihty 
table. If rjar is greater than 2 58 we can conclude with a fairly 



146 Factors Which Affect the Correlation Coefficient 


high degree of sureness that the true or universe value of r is likely 
torbe greater than zero. 

For N less than 30, it is necessary to follow a different procedure 
It can be shown that, if the correlation coefficient is computed 
for successive samples drawn from a population for 'which the 
correlation is zero, the successive values of 

Vat - 2 r 

‘-’'vrris- /rz? 

^JV-2 

will follow the t distnbution with df = N — 2. If a sample t, 
reaches the 01 level of significance, one would conclude that it is 
not a chance deviation from zero, or that some correlation exists 
between the 2 variables involved 
Prom the foregoing expression, it would appear that the t for 
testing the signi ficance of correlat ion is nothing more than an 
r/srj with Sr = V^(l — r^)/iN — 2) as an estimate of the sampling 
error of r. However, there are subtle mathematical reasons why 
such an interpretation is not permissible 
The student may wonder why the df is taken as iST — 2 Actu- 
ally, when we test the significance of an r, we are testing the 
significance of regression. If r is zero, the regression is zero in the 
sense that the regression coefficient or slope of the regression line 
is zero. Now a linear regression line involves 2 constants, its 
slope and its intercept; hence 2 degrees of freedom are lost in 
fitting the line. Suppose iV = 2, and that the 2 X scores differ; 
likewise, the 2 Y scores. Imagine these pairs of scores plotted 
in a scatter diagram, and a regression line fitted or a correlation 
coefficient computed. The regression line would go through both 
plotted points; therefore for the sample of 2 cases the prediction 
would be perfect and r would be unity The student may, as 
an exercise, prove algebraically that, when N = 2 and when there 
is vanation in both X and F, the correlation must be +1 or —1. 
In other words, with N = 2 there is no freedom for sampling 
variation in the numerical value of r. 

Formulas for the standard error of r, when f is large, are mis- 
leadmg because for high values of f the distribution of successive 
sample values is markedly skewed. This skewness becomes 
noticeable when f reaches .40 or .30 and increases rapidly as 



The r to z Transformation 


147 


nears unity The skewness is also a function of N. Because of 
this skewness the standard error of r loses its meaning; it cannot 
be expected to yield a trustworthy answer as to whether an ob- 
tained r deviates significantly from some a prion value, nor can 
the significance of the difference between 2 r’s be determined 
by substituting in the ordmary formula for the standard error of 
a difference 

The r to a transformation. Professor R. A. Fisher has 
developed a very useful and accurate technique for handling sam- 
pling errors for high values of r. This procedure is also applicable 
for low r's and can be used when N is large or small He employs 
a transformation 

z = ^ log« (1 + r) — J loge (1 - r) (43) 

or 


2 = 1 1513 logic 


1 + r 
1 -r 



which has 2 distmct advantages* (1) the distribution of z for 
successive samples is mdependent of the universe value, i e , 
for a given N the samphng distribution will have the same dis- 
persion for all values of f, (2) the distribution of z for successive 
samples is so nearly normal that it can be treated as such with 
very little loss of accuracy The standard error of z is 


' 1 


(44) 


If we wish to state the .99 confidence limits for f, we transform 
the obtained r to z by formula (43o) or by Table B of the Appendix, , 
determme <r„ find z + 2 58(r* and z — 2 58(7*, and then transform- 
these 2 z values back to r’s by usmg Table C. As an example 
and m contrast to the less exact procedure of taking r ± 2 58<7r, 
where (r, = (1 — i^)/y/N, let us suppose an r of 90 based on an 
of 50 The standard error of r by the usual formula is 027 , whence 
.90 ± (2 58) (027) yields the values 830 and 970 as confidence 
liimts for the universe value Now, if we utilize the zJaaiQPformar 
tion, we find z = 1 47, and <r* = .146, whence 1 47 ± (2 58) (.146) 
^ves 1 093 and 1 847 These 2 values aae then transformed 
back to the 2 r values, .798 and .951, which it will be noted differ 
from the confidence limitsjpr t as determined by the classical 
method 



148 Factors Which Affect the Correlation. Coefficient 


Difference between r’s. If we wish to determine the signifi- 
cance of the difference between 2 r’s, both are transformed into 
2 % and the standard error of the difference between the 2 
is obtained by 




-4 


+ 


Ni- 3 N2- 3 


(45) 


and then the ratio of the difference to its standard error is treated 
in the usual manner. If the z^s are significantly different, we con- 
clude that the 2 r^s are significantly different 
Suppose we have the correlation between Xi and X 2 and also 
between Xi and X3, with both r’s based on the same sample of JV 
cases, and we wish to decide whether there is a significant difference 
between ri2 and ri3 The foregoing method is not applicable be- 
cause we need to allow for the fact that, for successive samplings, 
ri2 and viz are not independently distnbuted, but correlated. The 
standard error of the difference must include a subtractive r term 
involving the correlation between the correlation coefficients. The 
methods for estimating this needed correlation are none too satis- 
factoiy, but there is a test which is interpretable by way of the t 
table for N small and by way of the normal table for N large. It 
has been shown that 

^ ^ (ri2 - ris) V (N - 3)(1 + 7-23) 

V2(l - 1^12 - r^is - r^23 + ^rizrisTss) 

follows the t distribution with iV — 3 degrees of freedom when the 
null hypothesis of no difference is true. If t is significant, we con- 
clude that one vanable correlates higher than the other with Xi 
Averaging correlations. When we have 2 (or more) sample 
values for the correlation between 2 vanables we may wish to 
average the r’s (1) in case it is known that the samples have been 
drawn from the same population or (2) in case it can be assumed 
(because the r's are not significantly different from each other) 
that the samples have been drawn from equally correlated pop- 
ulations. An appropriate procedure is to convert each r to then 
take a weighted (each z by the inverse of its sampling variance) 
average of the z^s. Thus, for 3 sample values this weighted average 
is given by 

(Ni — 3)21 + {N 2 — 3)z2 + {Ns — 3)23 


Zav = 


{Ni - 3) + {N 2 - 3) + {No - 3) 



149 


Range or Spread of Talent 

This Zav can be transformed back to an r, and any significance test 
concerning such an average r would be made on Zav which has a 
standard error of 

l/V(JVi - 3) + {N 2 - 3) + {Ni - 3) 


RANGE OR SPREAD OF TALENT 


The magnitude of the correlation coefficient varies with the 
degree of heterogeneity (with respect to the traits bemg corre- 
lated) of t he samp le If we are drawing a sam ple from a group 
which IS r estricted in range with regard to either" oTbbSTvanables, 
the f*nrr f!lfl.tinn will he relat ively low. Thus the res trict ed range 
of int elligenc e is one factor which leads to l ower correl ation b^ 
iwe en inteiiigmee'^nd ^des for college stu3ents tJbim that usu- 
ally^ found forl ugli sctiool' groups It the range with respect to 
T variable has been curtailed, and one knows the standard de- 
viation for an uncurtailed distnbution, it is possible to adjust 
the correlation for the diffeience in range, provided one can be 
sure of the tenability of 2 assumptions: that the regressions are 
linear and that the arrays are homoscedastic for the scatter based 
on the imcurtailed distnbution If the curtailment is in variable 
, and we let 


o-a; = SD for curtailed distribution, 

Sx = SD for uncurtailed distnbution, 

Txy = correlation of vanable Y with X for curtailed range, 

Rxy = correlation of vanable Y with X for uncurtailed range, 


the relationship by which we would predict Rxy from o-*, S®, and 
Txy IS given by 


^ “^xy^x/ 0 * 3 ;) 

° Vl - rV + 


(46) 


Obviously, if we have 22 instead of r, the value of r for a restneted 
range can be estimated by formula (46). All we need to do is 
interchange S and <r, 22 and r, and then substitute to find r The 
estimation of r need not be made m ignorance of whether the 
assumptions of linearity and homoscedastieity can be met, an 
examination of the accessible scatter for the uncurtailed range 
will reveal the facts. 


150 Factors Which Affect the Correlation Coefficient 


Formula (46) indicates definitely that the magnitude of the 
ccwrelation coeflScient is a function of the degree of heterogeneity 
\nth respect to one of the traits being correlated A better appre- 
ciation of the extent of this influence can be had by examining 
Table 13 which gives, for varying values of Rxy along the top and 

TahU IS Values for rxy for op 30, 40, • • • 80 with Values 
OP 90, 80, • - 50 


r " - - ^ 



.30 

40 

50 

60 

.70 

80 

.90 

.272 

,366 

.461 

559 

662 

768 

.80 

.244 

330 

419 

.514 

617 

730 

.70 

.215 

292 

.375 

465 

566 

682 

.60 

.185 

253 

.327 

410 

507 

625 

.50 

.156 

.213 

.277 

.351 

.440 

.555 


different latios along the left, the corresponding values of 

Txy It can be shown that double selection, i.e., curtailment 
on both variables, tends to depress the correlation coefficient. 
Since the foimulas for “correcting*’ for double curtailment are 
not too satisfactory, none is given here 

One important rule emerges from the foregoing, standard devia- 
tions should always be reported along with correlation coefficients, 
and some indication should be given as to variation typically 
found for the variables. 

EFFECT OF UNRELIABILITY 

Before considenng the effect of unreliability, or errors of measure- 
ment, upon the correlation between 2 variables, it is necessary 
that we digress to explam briefly what is meant by reliability. If 
we were assigned the task of determining the height of an indi- 
vidual by the use of a tape measure, we might be satisfied with 
1 measurement, but unfortunately a single determination might 
not be entirely free from error To overcome this, 2 or more 
measures are averaged on the assumption that the chance or 
variable errors will more or less cancel out If one computes the 
standard deviation of the distribution of several measurements 
(of the same thing), a summary figure indicating the possible 
magnitude of the variable errors will be obtained. This a neither 



Reliability 


151 


pertains to nor measures the magmtude of a possible cmstant 
error, i e , an error which affects all the measurements m the same 
direction. We are here concerned only with the magmtude of 
vanable errors, or inaccuracies in measurement which are of a 
chance nature 

Reliability. If we had the problem of determining the error in 
the measurement of height, we could make several measurements 
on 1 person and compute a measure of accuracy, or we might 
make just 2 measures on each of several persons and take some 
function of the difference between the 2 measurements for all 
N individuals as our gauge of accuracy Either scheme leads to 
an estimate of the size of the vanable errors that may be involved. 

In psychological measurement, it is not always feasible or pos- 
sible to obtain more than 2 measures on an individual for a given 
trait, hence it is necessary to use the second-mentioned scheme 
for determinmg the accuracy of measurement The mean or 
median absolute error may suflBce, but, as in physical measure- 
ment, we sometimes need to know the extent of the vanable errors 
in relation to the magnitude of the thing bemg measured, i.e , the 
relative or percentage error Psychologists have found it useful 
to interpret variable errors, not with regard to the magnitude (a 
nearly meanmgless word in psychological tests) of the measures, 
but relative to the vanability of the trait for a specific group of 
individuals The co rrelation between 2 determ i nati o ns is , as 
we shall soon see, one method of expressing the accuracy of meas- 
urement relative to the trait dispersion Such a correlation is 
termed the rdtabihty coeMcient 

Suppose X = an obtamed score or measure for an individual. 

Xoo = his true score 

6 = a variable error, positive or negative. 

Then we can consider that 


Z = Zoo + e 

or in deviation units 

X XaQ + e 


The variance of the obtamed scores will be 

(47) 

provided we can assume and e imcorrelated. This assump- 



152 Factors Which Affect the Correlation Coefficient 


tion seems reasonable since the variable error, e, is supposed to be 
a ^chance affair, as often positive as negative, and therefore its 
magnitude and direction should not be related to anything else 
Equation (47) can be stated in words the variance of the distribu- 
tion of scores can be broken up into 2 portions, the variance 
of the true scores and the vanance due to errors of measurement 
Suppose that for a given trait we have 2 measurements, each 
of which is in error but not necessarily to the same extent or m 
the same direction Symbolically, 

— Xao + ei 

X2 = 62 

in which the e^s represent the errors which go with the 2 obtained 
scores The reliability coefficient is defined as the correlation 
between 2 comparable measures of the same thing, i e., the cor- 
relation between rci and X 2 (Each measured individual will have 
an Xi and an X 2 score.) Thus we have the leliability coefficient, 


rn = 


7jXiX2 S(a;oo ■+■ 6i)(a?oo 4“ ^2) 


iWi<r2 Ncri(T2 

00 4 “ 2a/ 00^2 4 ^ So? 0561 + ^6162 
N<ri<r2 

Dividing by N gives 

00 4 " ^ ooez^ oo<f'e 2 4 " ‘^aoei^co^ei 4 " 


rn = 


<ri<r2 


(48) 


If we assume all three r's in the numerator equal to zero, we have 

rn 

aicr2 

It is assumed that we are correlatmg comparable measures of the 
same thing or trait — comparable in the sense that <rei = o-ea and 
iri = 0-2 (The same trait is implied in that xi and Z 2 are measures 
of rCooO Whence we have 


rn 




(49) 



Reliability 153 

where (t® = cri = cr 2 . The reliability coeflSicient can De inter- 
preted as a proportion, since from formula ( 47 ) we have 



ie., the reliability coefficient represents the proportion of the 
variance of the obtamed scores which is due to the variance of 
the true scores It follows that 1 — rn gives the proportion of the 
v ariance which is due to errors of measurement " 

Obviously, the reliability coefficient can, by substitution from 
formula (49) into the above expression, also be wntten as 

<^e 

rii = 1 - 3 - (50) 

(T z 

which indicates clearly that the reliabihty coefficient is a fimction 
of the magmtude of the variable error relative to the variability of 
the trait m question It also follows from formula (50) that the 
error of measurement can be stated m terms of the reliability 
coefficient and ax] thus, 


ae = ax^l — rn (51) 

That Ce is to be interpreted as th e standard error of measurernmL 
may be clarified if we note that, when x (= a;i or X 2 ) is taken as 
evidence of the true score, x — Xoo becomes the error, and the 
standard deviation of such errors will be aej as can be shown by 
easy algebra (an exercise). If it were possible to secure a large 
number of measures on an individual, we would expect these 
measures to distnbute themselves normally about the true score 
with a standard deviation corresponding to ae- Thus, if the result 
of one testing yields an IQ of 80, and if o-e = 3, we ca n conclude 
with high confidence that an in dividual's true position, on the 
gpfllp of mftafiiiredjf^btained ) IQ^s. is somewhere between 71 and 
*^9 “ (SITS’ SijeT/^d'mth fair confidence that it is somewhere be- 
TiweeSTTSTSe. It can readily be seen that the error of measure- 
ment expressed as a o- has a distinct advantage over such concepts 
as the mean or median error, or the mean or median difference 
between 2 measurements, in that ae en ables us to use the nrob - 
ability table either in establishing confidence limits or in deter- 



154 Factors Which Aflfect the Correlation Coefficient 


mining whether 2 scores differ more than is to be expected on the 
basis of chance 

There are 2 distinct situations for which one may wish to say 
whether 2 scores differ moie than expected on the basis of error 
First, consider 2 mdividuals each with a score on a given test The 
st andard err or of the difference between the scores is given by 
Second, consider 1 person with some type 
of comparable standard scores, Zj. and Zy^ on 2 tests having reli- 
ability coefficients, Txx and Vyy, and errois represented by and 
(Tey. The standard er ror (of mea sureme nt) for the differenc e score, 
D = — Zy, is c, + 1 “ raca + 1 — Tyy where 

<r is the standard deviation common to the 2 sets of standard scores 
For either situation, a difference between obtained scores divided 
by the appropriate standard error of the difference provides a CR 
for judging significance. 

Since difference scores of the second type frequently enter into 
clinical diagnosis or are used as a basis for guidance, it is of interest 
to know that the reliability coefficient for difference scores is 





(62) 


Even though both r^x and Tyy are satisfactorily high, the difference 
scores may have far from satisfactory reliability. If, for example, 
Txx = Tyy = 90, and Txy = .70, the value of rad is only 67. 

Determination of reliability. The above argument regarding 
the interpretation of the reliability coefficient either as an indica- 
tor of relative accuracy or in terms of <re rests on the supposition 
that we have obtained the rehability coefficient as the result of 
correlating comparable measuies of the same thing and that the 
variable errors are uncorrelated with themselves and with the true 
scores The practical determination of the reliability coefficient 
involves more, therefore, than the mere correlating of 2 sets of 
measurements. The conditions under which th e 2 sets of score s 
are obtamed must be scrutinized for possible violation of the 
requisite assumptions Some of the difficulties involved in ascer- 
taming the reliability of a psychological measurement are sug- 
gested in the following paragraphs. 

First let us note that the chance variable error, c, can be broken 
up into many smaller components at least logically, although not 



Determination of Reliability 155 

necessarily experimentally. Thus we might set 

e == 6a + Cft + Cc + Cd + e/ + • • • 

in which 6a = error m the instrument or test 

66 = error due to extraneous physical disturbance 
Be = error due to physiological condition of individual 
6d = error in scoring or m reading mstrument. 

6/ = error due to day-to-day fluctuations. 

Other sources of variable error might be added, or some of those 
listed might be broken up into more minute parts It is not 
assumed that these several sources contribute an equal amount to 
the variance of e, nor is it assumed that these several components 
are entirely independent of each other For instance, daily fluc- 
tuations might be influenced by physiological condition 
The assumption of uncorrelated errors implies that 6i is not 
correlated with 62 Of course the 2 scores for an individual mighty 
by chance contain a vanabie error ot thi^^ ame mainitude and 
sig^we are here interested, however, m whether an error which 
"is chance for one score might tend m general to affect the second 
score in the same manner For example, an upset stomach might 
lead to a reduced performance score, and if the second test was 
administered the same day, this same chance factor would affect 
the second performance score in the same direction. Thus in 
examming any proposed scheme for determming the reliability 
of a test we must inquire as to whether any of the sources of error 
can affect the 2 measurements on an individual m the same di- 
rection If it seems reasonable to suspect that errors are corre- 
lated, it follows that the obtained reliability coefficient will be 
spuriously high since the presence of correlated errors will not 
allow formula (48) to be reduced to (49) 

Let us consider a few of the ‘‘accepted” schemes for ascertaining 
reliability in order to see whether they are “acceptable” in light 
of the assumptions requisite to a sound reliability coefficient 
These assumptions may be recapitulated m the form of 3 ques- 
tions Do the 2 tests or determinations represent measures of 
the same thmg? Are the 2 series of measures comparable (com- 
parable tests or mstruments)? Is it possible or likely that the 
errors of measurement are correlated; i e , can the error on the 



156 Factors Which AflFect the Correlation Coe£Bicient 


first test be correlated with the error on the second, or can the 
error on either be correlated with the true measure? 

For the ordinary mental, peisonahty, or achievement test, 
reliability is usually ascertained by cwrelating s upposedly equivae- 
lent (comparable^) forms, by cor relating split halves ( odd vs even 
IteSos^r Ifirst half vs second Ealf of test), or by correlating test- 
retest scores. The test-retest method is of limited value in that 
there may be a memory carry-over from test to retest, in which 
case the retest will measure the same trait as the onginal test 
plus memory effects In order to overcome this memory transfer, 
the retest may be administered some months after the first test, 
but this permits of a possible change in the trait or ability as a 
result of maturation or experience. 

Split-half relia bihty involves the correlating of 2 hal ves and 
applying the Brown-Spearman formula to determine the reliar 
bility (Sf scores based on the whole test. This formula is easily 
derived. Let Xi and X2 stand for the respective halves. Now 
ri2 would be the reliability for scores based on either half, but in 
practice we always use total scores, defined as Xa = Xi + X2 
The reliabihty of Xa can be thought of in terms of the cor- 
relation between Xa and an imaginary set of comparable scores, 
Xft = X3 + X4, where Xz and X4 are scores on the 2 respective 
halves of a nonexistent form of the test. Given information about 
Xi and X2, we seek an expression for Tab- In deviation units, 
Xa — Xi + X2 and Xb = Xz + X4,] hence we may write 

_ 2a;aa;6 S(a?i + X2){xz + uJ 
NcTafTb NcTafTb 

^XiXz + SXi»4 + 2x2X3 + 2x2X4 
Nffoiffb 

Dividing through by N and utilizing formula (29), and with 
formula (37) as a basis for specifying <ra and cr^, we have 

nsO’l^^S + ^14<^l0’4 + TzzffzO-z + ^240'20’4 
+ 0^2 “f* 2ri20’l0’2A/ <7^3 + 0^4 + 2r34<73<r4 

Now it is assumed that the Xi and X2 scores are comparable 
(equivalent sets, with ci = 0-2), and we simply say that our imag- 
inary scores, X3 and X4, are comparable with each other and also 



Determination of Reliability 


157 


with Xi and X 2 ; hence all 4 o-’s have the same value, and therefore 
cancel out, leaving 


ri8 + ^14 + ^23 + 7*24 

2 + 2ri2'v/ 2 2r34 


Comparable or equivalent sets of scores will correlate equally 
with each other, that is, the 5 unknown r's m this expression will 
all equal ri 2 , our known value. Therefore we have 


^2 + 2ri2 v 2 + 2ri2 1 + *^12 


as the reliability of scores based on the whole test. 

The only assumption underlymg formula (53) is that the 2 
halves being correlated are comparable (equivalent or parallel) 
If the test items have been arranged accordmg to difficulty, a 
first-half vs. second-half reliability will not satisfy the notion of 
comparable measures. Ordinarily the odd-even item technique 
will satisfy the criteria of comparability and sameness of trait 
Ne ither of the split-half methods will satisfy the assumption of 
unc orrelated errors. Since bo th measures, a xe deteimned at the 
same sitting , any chance fluctuations due to phygiglogJwCaLxumd^ 
tions or to chance factors in the test situation will influenpe the 
2 scores o f an in dividual in the sanie_ (direction It is to be ex- 
pected, tiberefore, that the correlation of halves will in general 
lead to a reliability coefficient which is too hi^, ^^g 
ge raUidHOLiun crf-bhe -aecuracyjrith whicli we can place an individ- 
"ual on ttie trait contmuum * ^ 


By far the best method for determinmg the reliability of a test 
is to have 2 forms which have been made enuivalent a.ndLnmp- 
p arable by careful s election and bal^^g gf items. No item in 
one form ^ould b^so ne^iy identical with an item in the other 


form as to permit a direct memory transfer Two forms, equiva- 
lent yet not identical, can be admmistered within, say, 2 weeks’ 
^ procedure which properly in cludes in the estimate of 
variable error the dailv fluctuations du e to either ph^^qlogijeal 
"or psvcholoff ^ ^^T conditions and variations due to chance factors 
in the physical situation m which the testsare given. With so 
short an mterval between testings, the trait being measured will 



158 Factors Which AfiPect the Correlation Coefficient 


have changed only a negligible amount as a result of maturation 
or ordinary environmental influences 

When we attempt to obtam the reliability of a learning score 
or of any performance which is influenced by practice, we encoun- 
ter difficulties which are baffling to the researcher who rigorously 
adheres to the fundamental requisites of the reliability coefficient 
The chief difficulty is the obvious fact that the “thing” bemg 
measured changes as a result of each measurement or trial Test- 
retest, or first half vs. second half (of trials), or today’s trials vs 
tomorrow’s will not represent measures of the same function, nor 
will any scheme analogous to equivalent forms avoid this diffi- 
culty, since “forms” which are comparable will permit transfer 
The use of scores on odd vs even trials will have the advantage 
of balancing somewhat the influence of practice, especially if 
several trials are given, but the possibility that a chance error 
affects odds and evens alike is present, in that a slip in the experi- 
mental procedure or a temporary discouragement on the part of 
the testee or the adoption by the subject of a poor approach to 
the problem will have a similar effect on both scores If tnals 
were spaced, say, a day apart, the factors just mentioned might 
not greatly disturb the reliability determination In general, it 
can be said that the odd-even trial method will yield a reliability 
coefficient which is higher than the “true” reliability. 

The same shortcommgs are present in the aforementioned 
methods when they are employed m determining the reliability 
of animal (or human) maze-leammg scores. Other techniques, 
peculiar to the maze situation, have been proposed. Perform- 
ances on the odd and even blinds, somewhat similar to odd and 
even items, have been correlated for the purpose of reliability, 
but smce blinds differ considerably as regards difficulty, one can- 
not be sure that the 2 halves are comparable One can also ques- 
tion the comparability of the first half and second half of the maze, 
since in general the last part tends to be learned more quickly 
than the first Attempts to ascertain the reliability of one maze 
by correlating performance on it with that on another maze in- 
volve several difficulties In the first place, there seems to be a 
general positive transfer (perhaps a general adaptation to the 
maze situation) from a first to a second maze; secondly, the second 
maze must be similar to the first in order to satisfy the reqmsite of 
comparable measures of the same ability, but if this similarity 



Attenuation 


159 


approaches identity the second maze becomes a retest; and thirdly, 
a close degree of similanty will lead to possible mterference effects 
which may act differentially from animal to animal 

The foregoing brief discussion of the reqmsites for, and diffi- 
culties in arnving at, a meanmgful reliability coefiBicient should 
make obvious the necessity for examming critically any proposed 
method of determimng the rehability of a psychological measure- 
ment The mterpretation of the reliability coefficient in terms 
of the standard error of measurement definitely assumes homosce- 
dasticity, which is another way of saying that the rehabihty coeffi- 
cient is valid only when the error of measurement is of the same 
order of magnitude for the entire range of scores That this may 
not always hold true is evident from findings with the 1937 Stan- 
ford Revision of the Binet Test. 

It should be noted in passing that the magnitude of the reliabil- 
ity coefficient is influenced by the trait homogeneity of the sample 
upon which it is based Let o- represent the standard deviation 
for the restricted range, 2 the standard deviation for the unre- 
stncted range, rn the rehabihty for the restricted, and Ru the 
rehabihty for the unrestricted, it may be assumed that (r® for the 
smaller range equals <rc for the larger range, i e , 


0^(1 - ni) = S^d - 22u) (54) 

This is the usual formula given for relating reliability to amount 
of variability. It can be argued, however, that the important 
consideration is the relationship of the reliability coefficient to 
the amount of true variance; hence the formula should be written 
in the form 

Aii(l - rn) = - Bn) (54a) 

Attenuation. Now we return to the question which le8"t©4his^ 
lengthy detour How does imrehabd ity affect the correlation be- 
tween variables? Let 

a? = ajoo “1" ® 
y = y» + d 

where e and d represent the variable errors in the two scores, x 
and y. Then 



160 Factors 'Which Affect the Correlation Coefficient 


'S{Xoo + e)(yaa + 

2/^002/ 00 H“ 2 a;oo<i + + 2 ed! 

iV’o'ajCTy 

If we assume that d is uncorrelated with Xo^j that e is uncorrelated 
with 2/oo, and that e and d are uncorrelated, we have 

^ ^ between true scores) 

Nff±(Ty (Ta^y 

Since o-oo = <r-\/m by formula (49), 



which, since the reliability coefficients are less than unity, shows 
clearly that the correlation between obtained scores wiU be less 
than that between true scores; i e , errors of measurement tend 
to reduce or attenuate the correlation between traits. 

One can rearrange formula (55) as 


by which one can estimate what the correlation would be if perfect, 
errorless, measures were available This is known as correction for 
attmuatio n. ...Correlation coefficients corrected for attenuation are 
of theoretical importance in the analysis of relationships in that 
allowance can be made for variable errors of measurement, but 
such corrected r^s are of little practical value since they cannot 
be used in prediction equations. The prediction of one variable 
from another and the accompanying error of estimate must neces- 
sarily be based on obtamed, or fallible, rather than true scores. 

Since the correlation between variables is a function of the 
r^iability of their mea surem ent, we may examme the limits im- 
posed upon r as a result of fal lible scores By reference to formula 
(55), we observe that,^ the correla tion between true__scpres is 
unity and if the reliability for 1 variable is perfect, Uie obtamed 



Index Correlation 


161 


correlation between the 2 cannot exceed the square root of the 
reliability coefficient for the other variable If the correlation 
between the true scores is perfect and if each vaiiable is subject 
to errors of measurement, then the obtamed correlation cannot 
exceed the product of the square roots of the 2 reliability coeffi- 
cients Obviously, if the reliabilities are the same, the obtained 
correlation cannot be greater than the rehability coefficient 
In addition to the assumptions which were made specifically m 
deriving the formula for correctmg for attenuation, it is also neces- 
sary to meet aU the assumptions requiied for a sound reliability 
coefficient Smce obtained correlations and also rehability coeffi- 
cients are functions of the homogeneity, with respect to the 2 
traits, of the sample upon which they are based, it follows that 
the reliability coefficients used in coirecting an obtamed r should 
be based on the same sample as r or on a sample which is of com- 
parable homogeneity Corrected r’s greatly m excess of unity 
have been reported. Such absurd results lead one to ask whether 
the assumptions have been met, but this question should be raised 
concerning any corrected r, even though it does not exceed unity, 
since the assumptions are difficult to meet It has been said that 
a corrected r can legitimately exceed umty by as much as 2 or 
3 times its samplmg error Formulas for the standard error of 
a corrected r are available, but nothmg is known concerning the 
nature of the distribution of corrected r^s for successive samples. 
Presumably this distnbution would be markedly skewed for high 
values; hence the use of an ordinary standard error technique to 
determine whether a corrected r exceeds umty (or any other mag- 
mtude) by more than can reasonably be expected on the basis of 
samplmg is an unsound procedure 


INDEX CORRELATION 

A possible source of error in correlational work may be mtro- 
duced when 2 indexes having a common vanable denominator 
are correlated, such as Jf/Z and YIZ Before considering this 
special case, it might be well to turn our attention to more general 
formulas for indexes These formulas involve the coefficient of 
variation, namely, v = cr/M, and their use leads to serious error 
when the v's are large — v® and higher-power terms havmg been 
dropped m the derivations. 



162 Factors Which Affect the Correlation Coefficient 



Although these formulas are very useful for determining means, 
sigmas, and the correlations for ratios in terms of means, sigmas, 
and con elation coefficients for the original vanables, their use is 
somewhat Imuted in that generally one cannot know whether 
the index distribution is normal, nor can one make a statement 
concerning linearity and homoscedasticity for the correlation be- 
tween 2 indexes Such information, if needed, must be obtained 
by first determining the numerical value of the indexes for each 
individual and then makmg distributions 

Several special cases can be deduced from formula ( 59 ). Thus 
the correlation between X1/X3 and X2 is exactly equivalent to 
that between Xi/Xz and X2/I; i e , X4 is set equal to 1, which 
makes = 0 , and therefore all terms in ( 59 ) involving the sub- 
script 4 vanish. The correlation between XJXz and the recipro- 
cal of a vanable would be obtained by setting X2 — 1, i.e., letting 
I/X4 be the reciprocal; then vz — 0, whence the desired formula 
can be obtamed by dropping all terms involving t;2 Likewise the 
correlation can be deduced for I/Z3 with I/X4, for I/X3 with X2, 
and for X\IXz with X2/X3 This last correlation is of particular 
mterest because it is possible to find a relationship between these 
2 indexes even though the 3 original variables are uncorrelated. 

By substituting X3 for X4, i e , replacing subscript 4 by 3 , an 
expression for the correlation of indexes havmg a common variable 
denominator can readily be obtained. It will be 



Index Correlation 


163 


Xs Xs 


^ 12 ^ 1^2 ~~ ^*13^1^3 y*23^2^3 “t" ^^3 

+ V% — 2ri3ViV3V V^2 + t^3 “ 2r23V2»3 


(60) 


If 7-^2 = ^13 — ?"23 = 0 , this becomes 

Vj 

V + v^sV v\ + i;^8 

and if the v's are equal, the value of the index correlation will be 
50 even though there is no relationship between the original varia- 
bles. This is known as serious correlation due to indexes. There 
are instances, however, m which an analysis of the interrelations 
of ratios is of just as much import as the analysis of the variables 
from which the indexes are obtained, and therefore it does not 
follow that the correlation between ratios having a common de- 
nominator is necessarily misleading. 

It has been asserted that the correlation between IQ^s derived 
from 2 tests or 2 forms of the same test will be spuriously high 
because of the common vanable denominator, age It can be 
shown, however, that such a correlation will not be spurious un- 
less the 2 sets of IQ's are correlated with age If the IQ-vs -age 
correlations are both positive or both negative, the index correlar 
tion will be spuriously high, if one is negative and the other posi- 
tive, spuriously low Thus, rather than make a blanket statement 
to the effect that the correlation between IQ's is spuriously high, 
we should say that it can be spuriously high or low or not spurious 
at all, accordmg to the IQ-vs -age correlations It should be 
remembered that, even though the IQ's based on an ideal (properly 
constructed and standardized) test will be uncorrelated with age, 
a nonzero relationship might be produced for a single school- 
grade group by the selective factors that operate in age-grade 
location. Within a single grade group in a school system where 
acceleration is pemutted, the younger children are likely to be 
the brighter, i e , have the higher IQ's, thus producing negative 
correlations for sets of IQ's with age, and consequently a spuriously 
high correlation between IQ's. 



164 Factors Which Affect the Correlation Coefficient 


PART-WHOLE CORRELATION 


Another type of spurious correlation arises when a total score is 
correlated with a subscore which is a part of the total score. Sup- 
pose that a total score is made up of 3 parts, Xt — X\ X2 + 
and that we con'elate Xi against Xt Ordinarily in such situa- 
tions the components wiU themselves be correlated positively. It 
should be obvious that the extent to which X^ correlates with 
Xt is more or less dependent upon the fact that Xt includes Xi 
It does not follow, however, that a high value for rit is not mean- 
ingful, even though spurious For instance, a high value for r^t 
would, regardless of spuriousness, justify the use of Xx in lieu of 
the battery of 3 subtests There are times when one may wish 
to know how highly a subtest correlates with a total, based on any 
number of parts, minus the subtest This correlation is given by 


nci-i) = 


TyCt — (Ti 
i + v^i — 2ri^<rio’i 


( 61 ) 


HETEROGENEITY WITH RESPECT TO A THIRD VARIABLE 

We have already discussed the influence on r of heterogeneit y 
wit h regard to ^ pne or b oth the variables being: correlated. Suppose 
variables Xx and X2 are 2 different traits , e ach of which is re- 
la ted t o a^as~the*S 5 rdrvariat^j Then an older individua l will 
ten a to be'nf^her’bn both tests than a younge r Jndividual . In 
other word s heterogeneity with resp ect to ag e will tend to oro^ pe 
correlati on between Xx and X2 , and our present problem" is to 
de velop a that we can estimate what 

the ^rrelation bet w^n Xx and X2 would be if age were consta»i>r 

Suppose ri2, ri3, r2z, and the several meallns'*ahcl standard devia^" 
tions are known; then let us visuaEtzeT the ST scatter diagrams. 
The scatter for rxz will be somewhat elongated as a result of the 
influence of age, since variation in both Xx and X2 are here sup- 
posed to be partly due to age variation What is needed is the 
correlation, between measures of Xj and Xo. w hich has been freed 
&om the i nfluence of age If we were to express each Xx in the 
first array 01 zhe scatteFior rxs as a deviation from the mean of 
this array and were^to do the same for all other Xi's in the scatter — 
each as a deviation from the mean of the array in which it falls — 
we would have scores expressed as deviations from the means of 


Partial Correlation 


165 


the several ages. These deviations will be independent of age. As 
an example, suppose an 8-year-old individual scores 28 and tl^e 
mean of 8-year-olds is 25 , and a 14 -year-old individual scores 54 
and the mean of 14 -year-olds is 51 . The second individual scores 
higher than the first because he is older, but each would have a 
deviation (from his own age mean) of plus 3 Obviously, if we 
also expressed the X2 scores as deviations from the averages for 
the several ages, they too would be independent of age infiuences. 
Now, if we correlated these deviations (from age means) we would 
be correlatmg sets of Xi and X2 scores which would be free from 
age, and hence we would arrive at a correlation, between variables 
Xi and X2j which would not be affected by age heterogeneity. 

Partial correlation. The task of determining the correlation 
between 2 variables, with the influence of a third elimmated, 
can always be accomplished by actually computing all the devia- 
tions and then makmg a scatter diagram from which the r can be 
determined, but, in those cases in which we can assume hneanty 
of regression for Xi on X3 and X2 on X3, it is possible to set up a 
method for determinmg the desired correlation from the 3 correlar 
tion coefficients between the 3 vanables. K hnearitj^exists, we 
can correlate the deviations from the 2 regression lines instead of 
from the array means (or means for several ages if age is the third 
variable). Since 

® 1 = na — xs and a: 2 = r 23 — xz 

V3 Cq 

the 2 sets of deviation-from-regression scores will be 

<ri <^2 

Xi — a;'i = — ri3 — 3:3 and 0C2 — x'2 = a?2 *“ ^23 — Xz 

.. -- (Tz az 

The correlation of these deviation scores, which is designated by 
the symbol ri2 3 (readjjihe correlation between Xi and X2 with 
X3 held constant) and known as the partial correlation coefficient, 
becomes ' 

X(xi - x\)(x2 - a:'2) 
ri2.8 = 



166 Factors Which Affect the Correlation Coefficient 

Multiplying and summing the numerator, and noting that the 
<r’s in the denommator are nothing more than the errors of esti- 
mate, O'! 3 and 0-2 3, we have 

"2X1X2 — ^23 — 2xiXs — ri3 2x2iCz + Tisr2Z Sx^3 
ri2 3 2 ^ 3 

JVffiVl -r^is <r2 V 1 - r^2i 

Dividing by N, cancellmg <r^s, and collecting like terms, we get 


y*12 ~ ^*13^23 

® Vl - r^igVl - 1^28 


(62) 


This formula definitely assumes the linearity of the 2 regres- 
sion lines for predicting Xi and X2 from X3 Whether we corre- 
late deviations from array means or use formula (62), we end with 
a correlation which has been fieed of the mfluence of the third, or 
eliminated, variable. If, for example, age is the third vanable, 
the partial correlation coeflScient represents an estimate of what 
the correlation would be if we held age constant by the use of 
individuals of any one of the several age levels present in the orig- 
inal group. 

The difference between ri2 3 and ri2 mdicates how much of the 
correlation between vanables 1 and 2 is due to the influence of 
heterogeneity of a third variable Obviously, if the third variable 
is unrelated to Xi and X2, the partial r will equal ri2, and if either 
ri3 dr 7-23 is negative and ri2 positive, ‘^partialing out” X3 will 
raise the correlation. Is this reasonable? 

The diflSculties encountered in determining the direction of 
causation make it necessary to be careful in the use of the partial 
correlation technique When it is said that heterogeneity with 
respect to a third variable (X3) has in part (or entirely) produced 
correlation between Xi and X2, one must ask how the influence 
of Xz comes about Now if it can be argued that variation in X3 
is a cause of variation in Xi and X2, it is readily seen that ri2 is 
at least in part attributable to the fact that Xi and X2 have a 
common source of variation The partial, ri2 3, tells us the degree 
of correlat ion Jbetween Xi and X2 which would exist provided 
variation in X® were contirolled But if it cannot be claimed that 
Xz produces vanation iu Xi and X2, tlie interpretation of the 
partial r is far from clear. Suppose Xi precedes X3 m a temporal 



Partial Correlation 


167 


sgnse s o that we know va riation o n Xa couldn^t possibly con- 
tebute to variation in Xi, does it m^e sense to interpret 3 
as the correlation between Xi and X2 with tiie influence of X3 
nullified when we k now, that X3. could not influence Xi? Stated 
differently, the only way that X3 can prod uce or contribute to the 
correlation between Xi and X2 is by way of X3' producing varia- 
tion m Xi and X? 

THB^tecnnique can be extended for '^partialing out^’ or elimi- 
nating more than 1 variable. Thus, to obtam an estimate of ri2 
with X3 and X4 held constant, we can use 

^12 4 ~~~ ^18 4^23 4 

Vl - Aa 4Vl-r223 4 

which is in terms of first-order partials calculable by formula (62). 

The samplmg error of the partial coefficient may be handled by 
the z transformation. The standard error of the corresponding z 
will be ll-s/N — 4 when only 1 vanable has been eliminated, 
and 1 /\^N — 5 when 2 vanables have been eliminated 

The partial correlation coeflSicient based on a small sample can 
also be tested for significance by the t technique If 1 vanable 
has been eliminated, we have 

^"12 3 

^ N-Z 

with d/ = iV — 3 An additional degree of freedom is lost for 
each additional vanable eliminated. 

A perplexing and often-recurring question with regard to the 
interrelations of 3 vanables is this Are the correlations consistent 
among themselves, or, if ri2 and riz are known, what are the pos- 
sible limits for r23‘*^ If ^12 = unity and ris = unity, r23 must 
also equal unity, but, if ri2 = 0 and ri3 = 0, does it follow that 
7-23 = 0? It can be show n that the hmits for the c orrelation r23 
will always be ri2ri3 db Vl — 7^12 — r^i3 + ^^12^ 13- 

Examples: 

When ri 2 and tis each equal 90, the Imuts for r 23 are + 62 and +1 00; 

« « « « « « 50, “ " “ “ “ - 50 and +1 00, 

cc « « <1 <c <c 25, “ “ “ “ “ - 875 and +1 00. 



168 Factors 'Which Aflfect the Correlation CoefBcient 


SUMMARY 

"in this chapter, consideration has been given to factors which 
have a bearing on the magnitude of the correlation coefl&cient. If 
any of these is operative m the case of a paiticular coeflScient, it 
is the responsibility of the investigator to qualify his conclusions 
accordingly Published reports of correlational studies should 
include. 

1. A definition of the population being sampled and a statement 
of the method used in drawing the sample. 

2. The size of the sample and an adequate treatment of sampling 
by means of nonantiquated formulas 

3 The means and particularly the standard deviations of the 
variables bemg correlated, with some indication as to whether the 
sample is typical as regards heterogeneity with respect to the 
variables under consideration. 

4 The reliability coeflElcients for the measures and the method 
of determining reliability 

5 A statement relative to the homogeneity of the sample with 
respect to possibly relevant variables such as age, sex, race. 

6. A defense or precise interpretation of any reported correla- 
tions mvolving indexes or of any part-whole correlations. 

The researcher who is cogmzant of the assumptions requisite 
for a given interpretation of a correlation coefl&cient and who is 
also fully aware of the many factors which may affect its magni- 
tude will not regard the correlational technique as an easy road 
to scientific discovery. 



CHAPTER 11 


Multiple Correlation 


So far our discussion of correlation has been concerned chiefly 
with the prediction of one variable from another or the attributing 
of a portion of the vanance of one variable to the action of a second 
variable. We shall next consider the case where it is desired to 
predict one variable by using several other variables as a team of 
predictors, or where, if causation can be assumed, an attempt is 
made to analyze the variance for one variable into components or 
parts attributable to the action of two or more other variables. 
There is a close connection between the predictmg and the analyz- 
ing problems; let us first consider the method of predictmg one 
variable on the basis of other variables. 

the three-variable problem 

For simplicity, consider the problem of predictmg Xi from a 
knowledge of X2 and Z3. The Xi variable is frequently called the 
criterion, or dependent variable. If we had Xi to be predicted 
from X2 alone, we would have exactly the same situation as pre- 
dicting Y from X. That is, the Imear prediction equation (in 
gross score form) 

F = 5 X + A 

becomes 

X'l = BX2 + A 

and the deviation form 

y' = bx + a 

becomes 

of I = bx2 + a 

It will be recalled that the values of the constants, B and A, or b 
ATid a, were so determined as to give the maximum predictability, 

169 



170 


Multiple Correlation 


and that B and A turned out to be functions of the correlation 
coefficient between the two variables and of the means and stand- 
ard deviations for the variables. The equation which resulted from 
givmg A and B specific values was said to be the equation of the 
best-fitting line — the error of prediction was minimized 

Now, if we wish to predict Xi from X2 and X3, we start with an 
equation of the form 

X'l = £2^2 + B3X3 + A ( 63 ) 

which can be written in deviation imits as 
x\ = b2X2 + h^xs + d 

Either of these forms represents the equation of a plane. It can 
be shown that B2 = 62 and £3 = 63 In fact, this is rather obvious 
when we consider the meanmg of these B or 6 coefficients. They 
represent the slope of the plane; B2 is the slope which the plane 
makes with the X2 axis, and Bz the slope with regard to the xs 
axis. When we shift from raw to deviation scores, we are merely 
shifting the origin, or pomt of reference, to the intersection of the 
means, and this point in terms of deviation scores becomes zero. 
This shift of the frame of reference does not change the position 
or angle of the plane; hence B2 = 62 and Bz = 63. (The student 
will recall that, for the ordinary two-variable problem, the slope 
of the line was equal to B or 6 ) 

It remains to attach meaning to A and a. In the equation 
Y' = BX + A, it was noted that the constant A was the Y inter- 
cept, i.e., the value of Y where the line cut the y axis. It was also 
found that a = 0; i e , that m the deviation form the line cut the 
y axis at the origin Perhaps the student has already anticipated, 
by analogy, that the A in our three-variable equation is the value 
of Xi where the plane cuts the xi axis, and that the value of a will 
become zero. 

Before going farther, it might be well to take a look at the prob- 
lem geometrically. In the case of two variables, after plotting 
the X and Y values in a scattergram, we can readily picture the 
meaning of B and A, and also obtain some notion of why certain 
values of B and A wiU lead to better predictions than those ob- 
tained by other values In the case of three variables, Xi, Xz, 
and X3, we have a trio instead of a pair of measurements. In 



The Three-Variable Problem 


171 


order to draw up a plot of N such sets of measurements, we will 
need to use a three-dimensional scheme. Instead of placing^ a 
tally mark in a cell defined by an interval along the x axis and one 
along the y axis, we now have to consider a cell as defined by inter- 
vals on the xi, the X2y and the x^ axes. Instead of a square cell, 
we have a cubical cell. 

Suppose an individuars three scores fall m intervals iu ^2, and 

, then his “tally” will be placed in the cubicle formed at the inter- 
section of these three intervals. The total number of cubicles will 
be the product of the number of intervals on each axis, and an 
individuars location in the “box” will depend upon all three of 
his scores The student may be at a loss to know just how one 
could make such a three-dimenaonal scattergram Actually, 
this diagram is not necessary, but it is of interest to imagme what 
such a three-way distribution would look like. If the correlations, 
^i3> and r23, are fairly higji (and positive), and if we think of 
the frequencies m the several cubicles as being represented by 
dots (or different degrees of density), then the swarm of dots will 
extend from the lower left front to the upper right back of the box. 
The greatest density will be at the center of this swarm, and the 
density or frequency will fall off in all directions from the center. 
The swarm will have the general shape and appearance of a water- 
melon (ellipsoidal). 

Imagme that a plane is to be cut through this swarm. Our job 
is to so locate the plane that, when we start upward vertically 
from any point on the bottom of the box, say the spot defined by 
any pair of values for X2 and X3, we will find that the altitude, 
i.e , the distance along the xi axis at which the plane is reached, 
will constitute the best estimate of Xi for individuals having any 
given X2 and X3 scores. With a httle reflection, the reader can 
see that, of many ways of placing the plane, some positions will 
obviously give very poor estimates, whereas others will lead to 
better estimates What we need is that plane which, for the given 
jV sets of Xi, X2, and X3 scores, will yield the best possible esti- 
mates. 

The criterion of “best” is a least square affair — ^the sum of the 
squares of the errors of estimate shall be a minimum. The task 
is really that of determining the values of A, B2, and B3 in for- 
mula ( 63 ) so that 


S(Xi - X'l)- 



172 


Multiple Correlation 

is a TniniTYmm . That is, we are to assiga to A, £2, and those 
values which will permit the best possible estimate of an unknown 
Xi^when we know the and X3 values for the individual. The 
principle to be used is exactly the same as that employed to obtain 
the optimum value for B and A for the two-variable problem, but 
the present problem is more complicated because we have to deter- 
mine the values for three constants. 

Derivation of regression equations. Our task is simplified if 
deviation scores are used, and we assume a = 0 (if we earned a 
along, it would prove to be zero). It is simplified somewhat more 
if we transform all three sets of scores into standard score form, 
Le., if we set z = (X — M)/c. Then our equation becomes 

z'l = ^ 2^2 + ^ 3^3 (64) 

It should be noted that, since we are changing the size of our unit 
of measure, it cannot be argued that ^2 wiU equal B2 or 62. The 
task now is to determine the value of the beta coefficients, 02 and 
0Z9 so as to have the best possible estimate of zx, or so that the 
average of the squared errors, or 

^ 2(21 - 2'i)® 

shall be a minimum Smee 2i — a'l = 21 — 18222 — jSsas, the func- 
tion, /, to be minimized is 

/ = 4:2(21 - /3222 - feas)® 

N 


To determine the values of Pz and /Ss which will make this function 
a minimum, use is made of the calculus. We take the partial 
derivative of the function first with reject to then with reject 
to 183. Thus, 




— 2222 

— (21 — /3222 — 0828) 

N 


^ ~ <®222 - ^sas) 

OP8 N 

These two derivatives are to be set equal to zero and then solved 
simultaneously for the two unknowns, 02 and 08 . Performiii^ the 



Derivation of Regression Equations 173 


indicated multiplications, summing, and dividing each equation 
by 2, we get 


— 

N 

— S2 i 23 

N 


2/2^2 S2228 

+ & — H-fc— -0 

N N 


Since we are dealing with standard scores, we can now capitalize 
on certain properties thereof, namely, that the sum of their squares 
divided by JV is unity, while any sum of cross products divided by 
N is the correlation between the two variables mvolved m the 
cross products Thus, we have 

“"^12 + ^2 + 03^23 = 0 


or 


“^13 + ^ 2^23 +^3 = 0 

& + ^23^3 — ri2 = 0 
^23& + — 7*13 = 0 


(65) 


Since the r's in the equations are determinable for any given sam- 
ple of data, they are in effect knowns, whereas the are un- 
knowns. We therefore have two simultaneous equations with 
two unknowns These can readily be solved by a number of 
methods which the student wdl find in an algebra textbook. 
Straightforward solution gives 


^2 = 


7*12 — ^ 13^23 
1 - r®23 


ris — ri2r23 

— 3— 

1 — r 23 

As soon as we have computed the r's, we can easily determine 
the jS's. The obtained numerical values can then be substituted 
in the prediction equation 

= 02^2 + 03^3 

so that for a given pair of Z 2 and zs values we can predict the stand- 
ard score on the criterion variable. However, in practice it is 
ordinarily more convenient to deal with raw scores; hence we need 



174 


Multiple Correlation 

our prediction equation in raw score form. Obviously, if we re- 
place the z’s in the above equation by their values in terms of raw 
scores, means, and standard deviations, we will have 

X'l - Ml X 2 - M 2 Xa - Ma 

— '=^2 + 183 

O'! G 2 

or 

X'l Ml X2 M2 Xa Ma 

= — + ^3 — -/Ss — 

<ri <ri cr 2 o '2 o'3 <^3 

Multiplying by <ri and rearranging terms, we have 

X'i = fi 2 -Xa + ^3-Xa + 

from which we see that our original B2 must equal ^2((ri/c2), 
Bz = feCcTi/cra), and A = the parentheses term. Thus we can 
readily determine the numerical values of B 2 , Bz, and A and 
thereby have the constants for the prediction equation. Actually, 
the values of B2 and Bz are the optimum weights to be assigned 
to X2 and Xz in order to predict Xi. 

Error of estimate. The accuracy of the prediction of Xi by the 
best combination of X 2 and X 3 can be ascertained by examining 
the error term, i e , Xi — X'l or triizi — z\) The sum of the 
squares for the errors divided by N will yield the variance of the 
errors The square root would correspond to the standard error 
(rf estimate Let Czi ^ be this error (m sigma units), then 

^ S(2l - Z'lf 

N 

” N 

1 I 2 , .0 3 2 ^ 2 ^^!^ 2 ^^ZiZz 

= u 3% 1- 

N N N N N 

2^2^3^25223 

N 

= 1 + — 2 p2‘ri2 "" ^Pz^iz + 2i82i83^"23 


(Mi — ^2 — ^2 ““ ^3 — ■M’ 3 ^ 

\ 0-2 <r3 / 



Multiple r 


175 


which by algebraic manipiilation reduces to 

28 = 1 “ (^2J"i2 + feris) (6'J) 

in terms of standard scores. Then times this would give the 
error variance for raw scores. 

Multiple r. We next define the muUvple (^relation coefficient 
as the correlation between z\ and the best estimate of Zx from a 
knowledge of Z2 and 23. In symbols, 


23 — 




03222 + fega) 

Nffz'i 


( 68 ) 


Note that, although ctz^ = 1, it does not follow that = 1 In 
order to evaluate this last <r, we wnte 


= Z\ + Zi 23 


That is, we think of 21 as being made up of two parts, that which 
we can estimate plus a residual. It can easily be shown that these 
two parts are independent of each other; hence by the variance 
theorem we have 

0^*1 = + 0^ *121 


or 

then 


1 = 


But a^zi 21 is nothing more than the variance of the prediction errors 
as given by (67) ; therefore 

O't'i = ^02^12 + ft ^13 


Then, by substitutmg in formula (68), we have 


23 = + fegs) 

N's/ p2^i2 + P3T1S 


fi2^ZiZ2 + _ fey*12 + fey'is 

Ny/ P2’^i2 + ftris V feri2 + ftri3 


= V^/^2^12 + 



176 


Multiple Correlation 


We thus see that, as soon as the jS's are determined, we can write 
the regression equation for predicting zi from Z 2 and zs and can 
also specify the degree of correlation and calculate the error of 
estimate- This error obviously can be written from formulas (67) 


‘jid (69) as 


23 = O'! 




«rhich is in terms of raw scotcs. 

Formula (70) has ’be& used frequently to deiSne the multiple 
correlation coeflBicient Stated exphcitly, 


y2 — 1 _ 1 ^ 

23 “ i — i — cr 


Then, by substituting from (67), we again arrive at (69). 

The student will note the similarity of formula (70) to the ordi- 
nary error of estimate for the bivariate situation. Thus the multi- 
ple correlation coefficient can be interpreted, in terms of reduction 
in the error of estimate, in exactly the same manner as the ordinary 
bivariate correlation coefficient. The only difference is that we 
are now determining the regression coefficients, or wei^ts for 
two variables as a team, so as to get the best possible prediction of 
a third variable, whereas in the bivariate situation only one regres- 
sion coefficient is necessary. A multiple correlation coefficient of 
60 has, aside from minor qualifications to be discussed later, the 
same meaning in a predictive sense as an ordinary correlation of 
60. Furthermore, the interpretation in terms of contribution to 
variance also holds for the multiple correlation coefficient; i.e., if 
one can assume causation, it may be said that a multiple r of .60 
indicates that 36 per cent of the variance m the criterion or de- 
pendent variable can be attributed to variation in the two inde- 
pendent variables 

Relative weights. The question arises as to the relative im- 
portance of the two variables as contributors to variation in the 
criterion variable The B coefficients m the regression equation 
have, at times, been misinterpreted as indicating the relative 
contribution of the two independent variables. The reader need 
only be reminded that the two B coefficients usually involve dif- 
ferent units of measurement (one may be in terms of feet and the 
other in poimds) ; hence they are not comparable at all. If £2 is 
numerically twice £3, it does not follow that X 2 is twice as im- 



Relative Weights 


177 


portant as X 3 . In order to get around this difficulty, we must 
think in terms of standard scores; these will be comparable, ajtd 
hence the p coefficients m the standard score form of the regression 
equation will be comparable. 

Since 

<^Zi = <r^z'i + 


or 


and 


1 - 


it follows that 


l=a^Z>i + 0^ *128 
*1 28 “ 23 


23 = 

That is, r^i 23 , which corresponds to the percentage of variance 
explained, is equal to or the variance of the predicted stand- 
ard scores. This variance could be determined by actually making 
N predictions of z\ from the H pairs of values of z% and z% and then 
computing the sigma for the distribution of these predicted values 
This is not done in practice, since the value of this sigma squared 
is A 23 > which is easily calculated once the jS's have been deter- 
mined. 

But note that, since 

z\ = ^ 2^2 + PzH 

we can indicate the value of as 

, 2 ( 4 )® SO8222 + Us^s)® 

» — 


which becomes 


N 


o^z^x = A + P% + 2p2P3r2s (71) 

In other words, the predicted variance, which corresponds to the 
“explained” variance, can be broken down into three additive 
component^. We thus see that the relati ve o f the 

variables Xa and xT in “expla iimg^^ or~^**caiismg^^^~ v ariation m 
Xi can be judg ed by the magnitude of the the j S coeffi - 

cients, The tffird term in formula (71) reT yes^^^TjoSnt conmBu- 
tion which, it wiQ be seen, is a function of the amount of correla- 
tion between the two predicting variables. 



178 


Multiple Correlation 


Summaiimig, it can be said that the fundamental problem in 
Dogultiple correlation is that of obtaining the optimum weighting 
to be assigned to independent variables (X 2 and X 3 ) in predicting 
or explaining variation in a dependent variable, Xi. That is, we 
determine the value of B 2 , - 83 , and A in the equation 

X'l = £ 2 X 2 + BzXz + A 

so as to get the best possible estimate of Xi. This is resolved by 
working with the prediction equation in standard score form with 
P coefficients- The value of each ^ is determinable from the inter- 
correlations among the three variables. Once the /S's are calcu- 
lated, we can: (1) readily compute the B coefficients needed in the 
raw score form of the prediction equation; ( 2 ) determine the value 
of the multiple correlation coefficient and the error of estimate; 
( 3 ) ascertain the relative importance of the independent variables 
as predictors or, if causation can be assumed, as contributors to 
the variance of the dependent or criterion variable. It is important 
to note that the multiple correlation coefficient represents the 
maximum correlation to be expected between the dependent 
variable and a hnearly additive combination of X 2 and Xs. 


MORE THAN THREE VARIABLES 

Suppose that we have a dependent variable and four independent 
variables which might be used as predictors or which mi ght be 
thought of as causes of variation in the dependent variable. The 
cause and effect, as opposed to concomitant, relationship among 
variables is a logical problem which must be faced by the investi- 
gator as a logician rather than as a statistician. Whether one 
resorts to the multiple correlation technique as an aid in predictmg 
or as an aid in analysis will depend entirely upon the problem 
being attacked; the mechanical solution is the same, but the investi- 
gator must choose the interpretation which best suits his purpose. 

For a five-variable problem, we need the constants in the regres- 
sion or prediction equation, 

X'l = B2X2 H“ B3X3 -J- B4X4, BqXs + A 
which can be written in standard score form as 

^'1 = ^2^2 + + ^4^4 + p5^5 



More Than Three Variables 


179 


As in the three-variable situation, the problem is that of deter- 
mining the optimum values of the B’s or the so as to get the 
best possible prediction of Xi or i e , so that 

S(Xi - X'l)^ 

N 

or 

S(2l - 

N 

shall be as small as possible The mathemalical solution is eaaer 
by way of the standard score form of the regression equation We 
have tile function 

^ ^ S(gi - z'l)^ ^ S(gi - 192^2 - fegs - P 4 Z 4 - 

which is to be minimized by assigning proper values to the jS’s. 
These values are obtamed by takmg the derivative of the function 
with respect to, and in order for, each of the j3's. This will yield 
four derivatives which when set equal to zero will give us four 
equations involving the four unknown jS's. These equations can 
then be solved as simultaneous equations in order to determine 
the values of the jS^s. The obtamed jS^s will be such that the sum 
of the squares of Zi — z'l will be the least posable; i e , we will 
have the best possible estimate of zi from an additive combmation 
of the four mdependent variables 
The student of the calculus can readily verify that the four equa- 
tions obtained by taking derivatives of formula (72) will take the 
following form (when set equal to zero)* 

& + ^3^23 + ^4^24 + ^5^25 — ^12 = 0 

lS2^23 + ^3 + ^4^34 + ^5^36 ““ ?"13 == 0 

/32’"24 + 03^34 P4 + ^6^45 7*14 = 0 

+ P 4 U 5 + ^5 — = 0 

These equations result from steps exactly parallel to those used 
for the three-variable problem. The four jS’s are unknowns, 
whereas, for any ^ven batch of data, the r^s take on specific nu- 
merical values 




180 


Multiple Correlation 

The extension of multiple correlation to include any number of 
vanables involves the same pnnciples as utilized here for the 
three- and the five-vanable problem For vanables, formula 
(64) becomes 

z'l = ^ 2^2 + H h Pn^ri (64a) 

The extension of (66) as the gross score equation should be obvious 
Formula (69) for the multiple correlation coefficient becomes 

^"1 23 n = ‘^fe’’l2 + ^3^13 ^ h PvTln (69a) 

To solve for the unknown the student may resort to any of 
the schemes given in algebra textbooks for solving simultaneous 
equations. One method is by way of determmants and Cramer's 
rule. The coeflSicients of the unknowns are the intercorrelations 
among the four independent variables, whereas the constants in 
these equations are the respective correlations of the dependent 
with the independent variables. In the application of Cramer's 
rule, these constants are thought of as being on the ri^t-hand 
side of the equation, i.e , shifted to the right of the equality mark, 
with the consequent change of sign The student should keep 
in mind, however, the fact that the original sign of any of the 
computed correlation coefl&cients must be considered. 

, Solution by Cramer's rule becomes quite tedious and burden- 
some for a problem involvmg more than four or five vanables. 
Indeed, this detennmantal solution is practically impossible for 
problems involving a large number of variables. Fortunately, 
there is available a simplified solution, but before turning to it, 
we would like to indicate some algebraic manipulations in terms 
of determinants. 

It will be noted from the above simultaneous equations that all 
the intercorrelations among the five variables are involved One 
can conveniently arrange these correlations in a table, or in deter- 
minantal form. Thus we can define a major determinant as 

1 ^12 T’ls ^4 ns 

ri2 1 ^23 ^4 ns 

n3 ns 1 n4 ns 

n4 n4 n4 i ns 

ns ns ns ns 1 



Numerical Solution 


181 


If we were to delete the first row and first column, the minor 
which remains would involve the mtercorrelations among the four 
independent variables. This minor might be conveniently sym- 
bolized as Du; i e , we have deleted the column and the row which 
mvolves the subscript 1. If we were to delete the row which 
mvolves the subscript 1 and the column mvolving the subscript 2 
throughout, we would S3nnbohze the resulting mmor as D 12 . 

Now it can be shown that 


P2 = 


Di2 


or any ft say will be 


11 


Pp = (- 1 )^ 


ip 


11 


where the quantity (—1)^ is an indicator of either a positive or a 
negative sign, but the ultimate sign of fip is also dependent upon 
whether the numerical values of the determinants are positive or 
negative. It can also be shown that the multiple correlation coeffi- 
cient can be written as a function of determinants, thus 


^ 1 2345 == 1 — — “ 

The student who is interested in following a treatment of multi- 
ple correlation m terms of determinants is referred to T. L. Kelley's 
Statistical method * 


NUMERICAL SOLUTION 

The solution of the simultaneous equations for the unknown 
j8's can best be accomplished by resort to Doolittle's method. This 
method is applicable to the solution of any simultaneous equations 
involving a major determinant which, like D, is symmetrical about 
the diagonal- It is also apphcable to problems mvolving less or 
more than five variables. The first step is to write down the inter- 
correlations (coefficients of the unknown jS's) in the form indicated 
in Table 14, in which the right-hand column contains the correlar 
tion of each variable with the criterion or dependent variable. 
Negative sig n s are attached to these coefficients because, in essence, 

* Kelley, T. L., Stahsttcd method, New York; MaemiUan, 1924. 



182 Multiple Correlation 

we are dealing with equations (73) Obviously, if the original 
sign of an r were negative, it would be preceded by a plus sign in 
ah arrangement like that in Table 14 

Table 14 Schema for Arra.ngi\g r^s for Doolittle Solution 


Xi 

X 3 

X 4 

X 5 

Xi 

1 

r23 

r24 

^*25 

-ri2 


1 

^34 

»*S6 

— ri3 



1 

^•45 

— ri4 




1 

-ri5 


As a numerical example, we shall use data from the Minnesota 
study of mechamcal ability f The sample size is 100 

Let Xi = Criterion (mechanical performance-quality) 

X 2 = Minnesota assembling test. 

Xz = Minnesota spatial relations test. 

X 4 = Paper form board 
X 5 = Interest analysis blank 

Smce the several means and standard deviations will be needed, 
these are recorded in Table 15 


TaUe 16 Means and (Minnesota Data) 



Xi 

X2 

X3 

X 4 

Xb 

M 

14.94 

127 56 

1422 90 

46 60 

107.00 

O’ 

2 09 

25 32 

296 39 

19.45 

18.00 


In Table 16 will be found the Doolittle solution for the P coeffi- 
cients. Once these are known, the regression equation, in raw 
score form, can be written, and the multiple r and the error of esti- 
mate can be determined The table includes an indication of the 
calculation of these values The student will have to study care- 
fully the schema of the Doolittle solution in order to grasp the 
necessary steps. We shall not attempt a complete exposition of 
the steps since the procedure of each step is indicated in the left- 
hand side of the table A few remarks, however, will be of aid to 
the student 

t Paterson, D. G , aZ , Minnesota mechanical dbiltty tests^ Minneapolis. 
University of Minnesota Press, 1930. 



Numerical Solution 


183 


Table 16 Computation of Mui/hple r 


X2 

X, 

x* 

X5 

Xi 

ck 

(0) 100 

66 

.49 

42 

-55 

192 

(6) 

100 

.63 

.46 

- 53 

212 

(c) 


100 

39 

- 52 

199 

(d) 



100 

- 64 

163 

(l).lme(o) 100 

56 

.49 

.42 

- 65 

192 

(2) -100 

- 66 

-49 

-42 

55 

-192 

(3) line (6) 

1.000 

63 

46 

-53 

212 

(4) (l)(-56) 

-.314 

-274 

-235 

308 

-1075 

(6). (3) + (4) 

.686 

356 

225 

- 222 

1 045 ck 

(6). (6)(-l/ 686) 

-1000 

-519 

-328 

324 

-1524ck 

(7)- hne (c) 


1000 

.39 

- 52 

199 

(8) (l)(-.49) 


-240 

-206 

270 

-941 

(9) (5)(-519) 


- 185 

- 117 

115 

-542 

(10). (7) + (8) + (9) 


575 

067 

-.135 

507 ck 

(11) (10)(-l/675) 


-1000 

- 116 

.235 

-882ck 

(12) line (d) 



1000 

-.64 

163 

(13) (l)(-42) 



- 176 

231 

-806 

(14). (5)(- 328) 



-074 

073 

-343 

(15) a0)(-116) 



-008 

016 

-059 

(16) (12 + (13) + (14) + (16) 


742 

-320 

422 ck 

|(17) (16)(-l/.742) 



-1000 

431 

- 569 ck 


Back solution 

From (17) 431 « 185 

From (11) ( 431)(- 116) + 235 = = .186 

From (6) (.185)(- 519) + ( 431)(-.328) + 324 « « .087 

From (2) 

(087)(-.56) +(185)(-49) +(431)(-.42) + 55 « 230 

Final cliccks 

( 230)(1 00) + ( 087)( 56) + (.185)( .49) + (.431)( .42) - 56 - .000 

( 230)( .56) + ( 087)(1 00) + ( 186)( .63) + ( 431)( .46) - 53 - .001 

( 230)( .49) + (.087)( 63) + ( 186)(1.00) + (.431)( 39) - 52 - .001 

( 230)( 42) + ( 087)( .46) + ( 185)( .39) + (.431)(1 00) - 64 = .000 

From formula (66) 

- (- 230)11 - 

Bi - (.185) ^ = .0199, B, = ( 431) ^ = 0500, A - 6.40 
Tim 

Z'l = .0190^2 + 0006X* + 0199X4 + O6OOX5 + 6.40 
SS46 = ( 230)( 55) + ( 087)( 63) +(185)(52) + (.431)( 64) = .54465 
n 2846 “ .738, <ri 2846 - 2 09 Vl - (-738)* = 1.40 






184 Multiple Correlation 

As already spedfied, the correlations are wntten down in an 
order corre^onding to equations (73) except that values to the 
l^t and bdow the diagonal are omitted The first thing we do is 
to set up a check column. The first entry, 1 92, is obtained by 
giimmiTig , algebraically, the first row of correlations (including 
the diagonal 1.00) ; the second figure, 2 12, is the sum of the second 
row plus .56; the third entry, 1 99, is the sum of the third row 
plus 49 and .63; and the 1 63 is the sum of the fourth row plus 
.42, .46, and .39. The rule being followed should now be obvious: 
the jth entry in the check column is obtained by summing the 
1.00 m the jth row with the values above it and to its right. The 
student should satisfy himself that this is equivalent to summing 
the corrdations for the respective equations m (73). Since the 
check column wiU provide, at intervals, an automatic check on 
our computations, this summing should be done at least twice to 
msure accuracy. 

Line (1) of the solution is obtained by copsdng down line (a), 
the first row of r’s; and line (2) consists of the Ime (1) values with 
the agns changed. The second part of the solution be^ns with 
line (3), which is obtained by copymg down the (6) row of correla- 
tions. line (4) is obtained by multiplying entnes in line (1) by 
— 56, which figure is found in line (2) directly above the 1.000 of 
line (3). As mdicated at the left, hne (5) results from summing 
Imes (3) and (4), ie., 1.000 + (—.314) equals .686, etc. 

At this point we have our first automatic check: summing line 
(5) across should yidd 1 045, already obtamed by vertical summing 
of values in the check column. To be a satisfactory check, these 
two sums should agree within limits consistent with errors im- 
posed by rounding ofi to three decunal places. Acceptable dis- 
crepancies will be of the order ± 001, ± 002, • • • rfc.OOS, seldom 
larger. 

Line (6) is obtained by multiplying hne (5) by the negative 
reciprocal of its first entry. The correctness of the reciprocal used 
is evidenced by the fact that, when multiplied by .686, unity 
results. The ck attached to —1.524 indicates that summing the 
entnes in line (6) yields the same value as 1.045 multiplied by the 
negative redprocal of .686, thus providing a further check. This 
completes the second part of the solution. 

The third part begins with a copying of row (c) of the correlation 
table The student ^ould now be able to follow the steps; in 



185 


Sampling Errors 

particular, he should note that a multiplier is secured from the 
last line of each preceding part of the solution; that each multiplier 
IS apphed m turn to the values in the line just above it; that, wheui 
all such multipliers have been utilized, the Imes are summed 
(summing across agam provides a check), and the resultmg Ime 
IS, as before, multiplied by the negative reciprocal of its first entry, 
thus completmg the third part of the solution 

The fourth part mvolves similar operations. K we had five 
mdependent variables, we would proceed in like fashion, with an 
additional or fifth part. The schema can be extended to any 
number of variables There will be as many parts to the solution 
as there are independent variables. The last part always consists 
of three columns of figures, and the bottom figure in the middle 
column IS the value for Pn In our example — Ps — 431. 

The other jS’s are determined by a '^back^’ solution, which always 
involves a substitution of the value or values already found mto 
the last line of the various parts (Imes 11, 6, and 2 in our lUustrar 
tion). This back solution is given in Table 16. As a final check 
on all the computations, the four jS’s obtained must be substituted 
into the four simultaneous equations with which we began. This 
check appears next m Table 16. 

In order to put our results into useful form, we ordinarily require 
the multiple regression equation in raw score form, and for this 
we need the B coefficients and A as called for in formula (66) ex- 
tended for more variables To get the multiple correlation coefiS- 
cient, the P’s and appropriate r^s are substituted in formula (69a), 
and from (70) we obtain the standard error appropriate for judging 
the accuracy of predictions made by the calculated regression 
equation Table 16 includes these additional values 

If the problem involves analysis rather than prediction, one need 
not set up the regression equation or calculate the error of estimate. 
Appropriate interpretations would depend upon the and ti 2345 
(see discussion, pp. 176-177). 


SAMPLING ERRORS 


The classical formula for the standard error of a multiple correlar 
tion involving n variables is 

_ I 23 ■ * n 

<^* 128 . n - 


( 74 ) 



186 


IVIultiple Correlation 


If iV is very large, say over 500, and if the value of ri 23 „ is 

not too high, this formula will provide a satisfactory approxima- 
tton. But when N is small and the number of variables, is 
large relative to the size of the sample, the above formula yields 
an underestimate of the error. The significance of the multiple 
correlation coeflSicient can best be ascertained by the analysis of 
variance technique, to be discussed in a later chapter. 

Closely related to sampling is the shrinkage of the multiple 
correlation coefficient. This may be best understood by taking an 
extreme case. For the ordinary bivariate correlation, it is evident 
on a moment’s reflection that, if iV = 2, the correlation between 
the two variables must be perfect positive or perfect negative (it 
would be indeterminate if for either variable the two scores were 
the same) ; the regression Ime will pass through both plotted points 
on the scatter diagram That is, m so far as prediction is con- 
cerned, there would be no error In the case of three variables 
and = 3, it would be possible to pass a plane through all three 
plotted points. In general, if n = iV, we would get a perfect 
multiple r. Obviously N must be greater than n before any mean- 
ing can be attached to a multiple r. As n approaches Ny the value 
of multiple r always approaches umty 

This suggests that, when n is large relative to N, the real signifi- 
cance of an obtained multiple r is questionable. In other words, 
the multiple correlation coefficient is subject to a positive bias, 
the magmtude of which depends upon the degree to which n 
approaches N An unbiased estimate, r', of the universe value of 
can be obtained from 



This is sometimes known as a correction for shrinkage, since it 
has been observed that in general the correlation between observed 
and predicted values for a new sample tends to be less than the 
multiple r obtained by means of the jS’s computed from the original 
sample. Obviously, if iV is very large, say 500, and n small, say 
10, the amount of bias or expected shrinkage is so small as to be 
negligible. 

CAUTIONS AND BEMARKS 

As already indicated, there are two pnncipal uses for the multi- 
ple correlation technique: (1) it yields the optimum weighting for 



Cautions and Remarks 


187 


combining a series of variables in predicting a criterion and pro- 
vides an mdication of the accuracy of subsequent predictions; 
(2) it permits the analyzing of variation mto component parts? 
There are certain more or less obvious pits into which the unwary 
user of the multiple regression and correlation method may fall. 
For example, it is possible to write a multiple regression equation 
for predicting school achievement (Xi) from a knowledge of age 
(X2) and mental age (X3). In standard score form it might be 
z\ = . 272:2 + 6723, from which one might infer that school 
achievement depends upon age to a certain extent but upon mental 
age to a greater extent However, it is entirely possible to argue 
that mental age depends partly upon school achievement. One 
could also use the same data to write the regression for age on 
mental age and school achievement; thus ^'2 = -5621 + .O623, 
from which the unwary might conclude that age depends upon 
school achievement and mental age 

Multiple correlation may be particularly deceptive when one has 
available several variables, each of which yields a rather low corre- 
lation with the criterion and from which those yielding the higher 
correlations with the criterion are selected for the prediction equa- 
tion Such selecting tends to capitahze on correlations which 
might be high because of samphng fluctuations. For example, 
the author was once requested to compute the multiple r for an 
11-variable problem None of the 10 vanables showed a very high 
correlation with the criterion, the highest bemg 27. The resulting 
multiple was .44, which was statistically significant for the sample 
of 89 cases. When it was learned that 10 vanables out of 40 had 
been selected as the most promising, i.e., because they showed the 
highest correlations with the cnterion, the real significance of the 
multiple r of .44 was questioned. That it really was noisleading 
was clearly evidenced by the fact that for a second and similar 
sample the variable onginally yieldmg the highest r (.27) now 
produced an r of —.11. That is, the supposedly best single pre- 
dictor was actually of very doubtful value, and this, coupled with 
a tendency for the next highest ris to drop appreciably, meant that 
predictions by the regression equation could not be as good as was 
inferred from the multiple of .44. 

Nothing has been said as yet concerning the principal assump- 
tion and consequent limitation in the use of multiple regression 
equations, namely, that regressions for the first-order correlations 



188 


Multiple Correlation 


must be linear. There are methods for handling multiple correla- 
tion for curvilinear regressions. The reader is referred to 
Ezelders Methods of con elation analysis J 

It is not obvious from our discussion that, m general, the increase 
in the multiple correlation which results from adding variables 
beyond the first five or six is very small. This phenomenon of 
diminishing returns would not, of course, operate if we were to find 
an additional variable which correlated much more highly with 
the criterion than any of those already utilized. 

Another fact which may not be apparent to the reader is that we 
can expect the multiple r to be higher when the intercorrelations 
among the predictors are low instead of high This point can be 
easily demonstrated to one’s o\vn satisfaction by computing the 
multiples for, say, T12 = .50, ris = .50, and varying values for 

^ 23 . 

An interesting paradox of multiple correlation and an exception 
to the fact mentioned in the previous paragraph is that it is possible 
to increase prediction by utilizing a variable which shows no, or 
low, correlation with the criterion, provided it correlates well with 
a variable which does correlate with the criterion. Thus, if 
ri2 = .400, ri3 = .000, and 7*23 = 707, the regression equation 
will be z'l — 8002?2 ““ .56623, and ri 23 wUl equal .566. It is thus 
seen that, when 23 is combined with 22? an appreciable gain in pre- 
diction occurs even though when taken alone 23 is worthless as a 
predictor of Zi. 

Such a variable has been termed a ^^suppressant.” One does 
not quickly see just how a suppressant variable, showing no corre- 
lation with the criterion, can increase the accuracy of prediction 
Perhaps this point can be explamed by reasoning by way of the 
notion that correlation can be thought of in terms of common ele- 
ments (pp. 140-141). Suppose that Xi is composed of 10 elements, 
X2 of 10, X3 of 5, and suppose that Xi and X2 have 4 elements m 
common, X2 and X3 have 5 elements in common, and Xi and X3 
have no overlapping elements. DiagrammaticaUy, the variables 
and elements would be 

3:3 

aaaaaabbbbcddddd 

t Ezddel, M , Methods of correlation arudysts. New York* John Wiley, 1941 



Note on Notation 189 

By substituting in the common element formula for correlation, 
we find ri2 = .400, ri3 = .000, r23 = 707. These lead to z\ = 
80032 — .56633, and r-i 23 = 566 Variable Z3 has a negative 
regression weight, i.e , by the use of X3 something is being sub- 
tracted or suppressed. As set up here for illustrative purposes, 
all the elements of Xz are contained in X2] these elements are not 
related to Xi and hence their presence in X2 must tend to lower 
the correlation between Xi and X2; if these elements could be 
suppressed, the correlation between Xi and X2 minus the irrelevant 
(so far as Xi is concerned) elements of X2 should be higher than 
ri2 Actually, if we think of the elements of the diagram as 
being nonexistent, we would have variation m X2 dependent upon 
only 5 elements, 4 of which overlap with X\. The co rrelation 
between Xi and the abridged X2 would be 4/VlO(5) or .566, 
which has exactly the same value as the multiple r obtained above 
This exact correspondence to 23 will be obtained only when all 
the X3 elements are contained in X2. If X3 contains other ele- 
ments, its use as a suppressant will aid in predicting Xi, but the 
resulting ri 23 wiU not correspond to an r deducible from the 
common element formula The reason for this is left as an exercise. 

The student, by resort to the notion of common elements, may 
secure a better understanding of the proposition that a higher 
multiple is obtainable when the correlations with the criterion are 
high and the correlations between the predictors low or zero 
The reader should be warned, however, that such a condition is 
hard to realize m practice, as is also the finding of variables which 
will qualify as suppressants. 

NOTE ON NOTATION 

The symbol ri 23 has been used to represent the correlation 
(multiple) between Xi and the best combination of X2 and X3 
This should not be confused with ri2 3, which indicates the correla- 
tion (partial) between Xi and X2 with the effect of X3 ruled out 
or held constant. The symbol o-y.®, it will be recalled, stood for 
the standard error of estimate of V as estimated from X; ai 2 
would be the error of Xi when estimated from X2; and <ri 23 would 
be the standard error of estimate of Xi when estimated from X2 
and X3 by means of the multiple regression equation. 



190 Multiple Correlation 

In the foregoing discnsaon, /S2 has been used as the symbol for 
the regression weight of X2 A more formal, albeit cumbersome, 
ilbtation would be fits 345, which would be read as the regression 
of Xi on Xa, i e., the coefficient for Xa, when used m combmation 
with Xa, X4, and Xs- It is not an accident that the subscript 
pattern resembles that for the partial correlation coefficient. If 
we were with a three-variable problem, fia could be written 

as fita 8 This notation really means that we have the net regres- 
sion of Xi on Xa when Xs is held constant Hence the coefficients 
are spoken of as parftal regresaon coefficients. As a 

matter of fact, these partial or multiple regression coefficients can 
be computed by way of partial corrdation coefficients, but the 
method is not nearly so strai^tforward and self-checking as the 
Doolittle procedure. 



CHAPTER 12 


Other Gorrelation Methods 


The product moment correlation measure is applicable only 
when the two variables are graduated, is restricted by the assump- 
tion of linearity of regression, and needs careful qualifying if 
either or both vanables yield skewed distributions. There are, 
therefore, many problems for which it is inappropriate. In gen- 
eral, the majority of the situations which are met in practice can 
be handled by some type of correlational technique 

There are no general rules to follow in the case of vanables 
yielding skewed distributions Frequently, one can use a logarith- 
mic transformation of such a variable and thereby secure scores 
which are at least approximately normal, or one may deliberately 
normalize the distnbution by converting the raw scores into T 
scores When one considers the arbitrary units involved in most 
psychological measurement, such a procedure would seem not 
only permissible but also defensible m that the correlational de- 
scription of the relationship need not be qualified because of 
skewness. 

The situations arising most frequently in practice, for which 
measures of correlation are apt to be needed, can be subsumed 
under the following six headmgs (1) graduated measures for one 
variable, dichotomized or two-category information for the second 
vanable, (2) both vanables dichotomized; (3) three or more cate- 
gones for one variable and two or more for the second; (4) three 
or more categories for one vanable and a graduated senes of meas- 
ures for the other; (5) both variables graduated, with curvihnear 
relationship; (6) when data are rank-orders. 

An estimate of the degree of correlation for each of the above 
situations can be obtained provided certam assumptions concern- 
ing the vanables can be regarded as tenable Ordmarily the 

191 



192 


Other Correlation Methods 


graduated vanable can be thought of either as being continuous 
or as progressing in a sujEcient number ol discrete steps so as to 
give the appearance of contmuity The approach to normality 
for such senes can, obviously, be specified The nature of the 
categorized vanable, whether discrete or continuous, can ordi- 
narily be ascertained on logical grounds, but the question of 
whether a continuous vanable for which we have only a distnbu- 
tion by categories would 3 ueld a normal distribution if we had 
some measunng stick for the trait is not easy to answer. 

BISERIAL CORRELATION 

When one variable is measured m a graduated fashi on and the 
other is in the form o f a d ichotomy > we have the so-called bisenal 
situation, for which there are 2 measures of correlation- bisenal 
and point biserial r The difference between these 2 measures 
depends essentially on the type of assumption which is made con- 
cerning the nature of the dichotomized variable. 


Table 17 Bisebial Table fob "Abstract Words” as X and Binet IQ as Y 



Item 




IQ 

Fad 

Pass 

Totals 



(1) 

(2) 



145-149 

140-145 


1 

1 

Vy = 17.69 

135-139 


1 

1 

VI * .37 

130-134 


3 

3 

P2 = 63 

125-129 


4 

4 

2 = 378 

120-124 


6 

6 


115-119 


10 

10 

(109 86 - 84 43) (37) (63) 

110-114 


7 

7 

(378) (17 69) 

105-109 

1 

8 

9 

100-104 

1 

5 

6 

■= 89 

95-99 

90-94 

4 

7 

9 

6 

13 

13 

Or by formula (76o) 

85-89 

9 

2 

11 

(109 86 - 100 46) (63) 

80-84 

3 

1 

4 

(378) (17.69) 

75-79 

4 


4 

70-74 

5 


5 

= 89 

65-^9 

60-64 

3 


3 

(109 86 - 100 45) [63 
17 69 \ 37 

Totals 

37 

63 

100 

-.69 

Means 

84.43 

109 86 

100 45 




193 


Biserial Coefficient, 

The most typical example of situations calling for one or the 
other of these measures is to be found in the test (meirtal^and 
personality) field the c orrelation between an itenLscored as pass 
or fail (ye s or no, like or dislike, etc ) and a g raduated critenon 
variable (or a to^l^score on all of a set of itei^) We neecT to 
know each individuals score ohTEe''gVaduatiSd variable and the 
dichotomy to which he belongs Then we can make a distribution 
or scattergram with from 12 to 20 intervals for the graduated 
variable along th e y ax is, and with 2 intervals for the 2 categories 
along th e x a xis Such a correlation scattergram is given in Table 
17, which involves pass-fail on “abstract words” vs. composite 
IQ on Forms L and M of the 1937 Stanford-Binet It is obvious 
that there is a tendency for those who fail the item to have lower 
IQ’s than those who pass — ^performance on the item is related to 

IQ 

Biserial coefficient, If it can be assume d that underlying 
the dichotomy there is a continuous variable, we can obtam a 
measure of correlation which is an estimate of what the product 
moment correlation would be in case the dichotomous variable 
were measured in such a way as to produce a normal distribution 
This estimate is given by 


n — 


{Mj - -Mi)(piy2) 

ZiXy 


or by the exact equivalent 


n 


(Mz - ilfy)p2 

Z(Ty 


(76) 


(76o) 


m which pi = proportion of cases m the first category. 

P 2 = “ “ “ “ “ second “ 

M\ = mean of F’s for cases in the first category 
Mz = “ “ “ “ “ “ “ second “ 

My — “ “all the Y scores 

,ry = SDoi “ “ “ “ 

2 = ordinate for the iniit normal curve at the point 
where pi (or ^ 2 ) cases are cut off; it is determined 
by entermg pi or pz, whichever is smaUer, as a g 
value m Table A, then readii^ off the adjacent 
ordinate value in the fopr^ colurnn of the table 
(mterpolating if necessary). 



194 


Other Correlation Methods 


Formula (76a) is the more convenient when each of a series of 
items is to be correlated against the same graduated variable. 
'Hie computations are illustrated in Table 17 

In the denvation of n it is assumed not only that a normal dis- 
tiibution underlies the dichotomy but also that the regressions 
would be hnear if the dichotomized variable were measured The 
latter assumption cannot be checked; it is apt to hold for ability 
variables but may be violated for personality traits The former 
assumption has troubled many Actually, the mam issue is the 
question of continuity Consider the pass-fail dichotomy, it is 
obvious that failing a test item represents anything from a dismal 
failure up to a near pass, whereas passing the item involves barely 
passing up to passing with the greatest of ease Such a line of 
reasoning is certainly presumptive evidence for continuity, and a 
similar argument can be advanced as regards yes-no, like-dislike, 
and similar categories. Given a continuous trait, it is usually (if 
not always) possible to construct a test thereof which yields a 
normal distribution, and consequently we need not worry about 
the mathematical assumption of normality when usmg We 
can justify the use of n with obviously continuous variables by 
sa 3 nng, as pomted out earlier, that the obtamed coeflS-cient repre- 
sents what we would expect the product moment correlation to 
be if we had a measuring scale, for the dichotomized trait, which 
yielded a normal distribution. 

The sampling error of biserial r is given approximately by 



As an exercise, the student should compare the magnitude of the 
sampling error of bisenal r for various cuts (p values) with that 
of the product moment r as given by the analogous classical form, 
(Tr = (1 — r^)/y/N, It might be anticipated that the sampling 
error will be large when dichotomies are extreme, i e , involve cuts 
yielding very high (and low) p^s Thus, if JV = 100, and we have 
a 95- 05 cut, it follows that one of the means used in computing 
Th by formula (76) will be based on only 5 cases and therefore will 
be subject to rather large sampling fluctuation, which incidentally 



195 


Biserial Coefficient, n 

will not be counterbalanced entirely by the relatively greater sta- 
bility of the other mean It may occur to the reader that the use 
of foimula (76a) would overcome this difficulty, since one c^ 
always arrange to use the mean for the category havmg the larger 
number of cases, thereby avoiding the unstable mean This 
appears plausible enough; its refutation is left to the student. 

The fact that the samplmg error for bisenal r is large when ex- 
treme dichotomies are involved should serve as a warning. Unless 
N is fairly large, one should not place much confidence in a biserial 
r based on cuts more extreme than 10 (or 90) 

Since no r to 2 transformation is available foi use with bisenal 
r, the difficulty of skewed samplmg distributions for high r^'s 
cannot be overcome In testing the null hypothesis (that no corre- 
lation exists), the r term in formula (77) may be dropped. For 
N small, a more adequate test of the significance of is possible 
by way of the t test for the difieience between M 2 and Mi 

Although Th IS an estimate of a product moment r, there are 
limitations as to its interpretation It is, of course, a measure of 
the degree of relationship between 2 vanables. It does not, how- 
ever, enter into prediction formulas, nor does it lead to an error 
of estimate If we know to which X category an mdividual be- 
longs, the predicted Y is simply the mean of the Y scores for that 
category, and the error of such an estimate is the standard devia- 
tion of the Y scores i n the giv en category This error of estimate 
would not equal o-y 1 — 

If we have a Y score to use in predictmg an mdividual's X 
category, we estimate on the basis of the tendency for those pos- 
sessmg Y scores m a given interval to fall piedominantly mto the 
first or second category on X The error for such a prediction 
must depend upon the relative frequencies m these 2 categories 
for individuals possessmg a given Y score. Thus, if the frequencies 
in the first and second categories were 18 and 6 (for a given Y 
interval), the error might be stated somethmg like this* the odds 
are 3 to 1 that the given mdividuaFs X position is in the first 
category, ie, 76 per cent of the tune the prediction would be 
correct But such a percentage statement might itself be subject 
to grave samplmg error since it is based on a small N ; and such 
a statement of error might need to be qualified accordmg to the 
p's Why? 



196 


Other Correlatioii Methods 


Point biserial, Tpi. If the dichotomoiis trait is truly discrete, 
an appropriate measure of correlation is given by 

4) 



M2 “ My 
Tph = 

ffy 



(78a) 


Actually, ts the product moment correlation between Y and 
the X categories scored as either 0 or 1 (scoring as 1 and 2, or as 
4 and 10, or any other 2 values will yield the same correlation) 
The value of for the data of Table 17 is 69, compared to an 
Th of .89. The magnitude of tends to be less than that of 
for the same set of data, as can be seen by examining the following 
connection between the 2 coefficients 


For a 50-50 dichotomy, z = .3989 and r^b = 798rb, and as the 
dichotomy departs farther and farther from 50-50 the discrepancy 
between rpb and increases For a 10-90 cut we have = 
585rb. The maximum degree of correlation between a dichot- 
omous variable and a normally distributed variable will occur 
when there is no overlap between the Y distributions for the 2 
categories. For such a situation rb will be either +1.00 or —1 00 
regardless of the cut, whereas r-pb will be zfc 798 for a 50-50 cut 
and only db 585 for a 10-90 cut These 2 coeflficients are not on 
the same scale; they will agree only when there is exactly no rela- 
tionship between the 2 variables Even if the dichotomous vari- 
able were a genuine point variable, r^b as an expression of the 
degree of relationship would not be comparable either to or 
to the product moment r between 2 variables measured in a grad- 
uated fashion 

Despite the fact that true point variables are practically non- 
existent in psychology and despite the diflBiculties of interpreting 
rpb as a terminal descriptive statistic, has a rightful place in 



Tetrachoric Correlation 


197 


certain analytical and practical work where the 2 categories are 
arbitrarily, for convenience, assigned point scoring values of, say, 
0 and 1. For example, if a dichotomized variable with poiivt 
sconng were included m an n variable multiple regression equa- 
tion, point bisenal r's would be the correct values for the correla- 
tion of the dichotomized variable with the remammg w-1 variables 

For the large sample situation the significance of Vph (as a 
deviation from zero) may be tested by using crrp^ = l/\/F as its 
standard error. For small samples, the t test to the difference, 
M 2 — Mij is appropriate 

A troublesome difficulty with the biserial coefficient, r^, is that 
it occasionally exceeds unity The usually given explanation for 
this is that the assumption of normality for the dichotomous 
vanable is not tenable^ut it seems more likely that when such 
r’s occur it is because the graduated vanable, for the combined 
categones, is either platykurtic or bimodal m distnbution 


TETRACHORIC CORRELATION 

When both variables yield only dichotomized Ujuuimaoiuu, aes, 
to example, 2 items scored as passed or failed, it is possible to 
secure an estimate of what the correlation would be if the under- 
lying traits were continuous and normally distributed or if they 
were so measured as to give normal distributions The measure 
of relationship for such a situation is known as th e tetrachona. 
corre laticm coefficient, usually designated as It is not feasible 
to d^ve here the formula for tetrachonc correlation, but perhaps 
a few words will help one imderstand the reasoning back of the 
formula. 

Let us suppose that we have before us a scattergram to the 
correlation between height and weight; let us further assume that 
this scatter exhibits aU the characteristics of a normal correla- 
tional surface as defined by equation (41) That is, the 2 mar- 
ginal distributions and all the vertical and horizontal array dis- 
tributions are normal; the regressions are Imear; and the arrays 
homoscedastic. For such a normal plot, it is possible, knowmg 
the degree of correlation and the means and sigmas of the 2 vari- 
ables, to specify how many or what proportion of the cases will 
fall m any given segment of the scatter plot. This can be done 
by mathematical manipulation of formula (41) or by the aid of 



198 Other Correlation Methods 

Table VIII of Pearson’s Tables for statisticians and biomefriaansj 
'part II * 

w-Now, of course, if one had placed before him a scatter for height 
vs weight and were asked how many cases fell in that portion of 
the table below 120 pounds and also below 68 inches, he would 
simply count them But suppose he were told that, when the 
2 axes were cut at 120 pounds and 68 inches, the frequencies in 
each of the 4 quadrants so formed were as shown in Table 18 


Table 18 Comublation for Height axd Weight Dichotomized 


Above 68 in 
Below 68 in 


Below Above 
120 lb 120 lb. 

90 
110 
200 


10 

80 

60 

50 


70 130 


The purpose of tetrachoric correlation is to ascertain the degree of 
correlation which would permit the observed frequencies in such a 
fourfold table. A more rigorous statement would be Given the 
4 frequencies, what should be the true correlation — ^for the scatter 
underlying the fourfold table — ^in order to make the obtained 
4 frequencies most likely? 

In order to secure this estimate it is necessary to convert into a 
proportion each of the 4 frequencies and each of the marginal 
totals by dividmg by N For the fourfold table we may symbolize 
the frequencies as m Table 19, the proportions as in Table 20. 


TaUe 19 F»EQUENaE& Table 20 Proportions 



— 

+ 

— 

+ 


+ 

A 

B A -f- B 4“ 

a 

b 

P 

- 

C 

D C +D 

c 

d 

Q 


A + C 

B+D N 


p' 

1.0 


Then, the tetrachoric coeflSlcient can be obtained from the follow- 
ing rather forbiddmg equation: 

* Pearson, Karl, Tables for statisttcians and biometrunans, part //, Cam- 
bridge Cambridge University Press, 1931 




Tetrachoric Correlation 


199 


c qq' ^ o o ^ 

= r + xy-+ { 3 ^ - l)(y^ “ 1) T 

ZxZy 2 6 

+ (rc® - Sx)(y^ “ %) ^ + • • (79) 

24 

in which it is assumed that both g and q' are less than .60 The 
general rule is to choose whichever is smaller, p or g, to pair with 
whichever is smaller, p' or g' This determines, logically, whether 
a or 6 or c or d becomes a part of the formula Thus one can have 
c — gg' (as given), or 6 — pp', each of which will yield a positive r 
for positive correlation or a negative r for negative correlation, 
or one can have a — g'p or d — gp', each of which will yield an r 
with sign opposite to its true sign (It is, of course, here assumed 
that reading to the right on the x axis and up on the y axis means 
more of the traits ) 

We must next specify the meanmg of the x, y, and g’s in formula 
(79) As for bisenal r, Zy is the ordinate of the unit normal curve 
where g proportion of the cases are cut off, Zx has a similar meaning 
for g' The y represents the value on the base line of the umt 
normal curve where g cases are cut off , i e , the x/a in Table A of 
the Appendix, and x is similarly determined from a knowledge of 

To equation (79) additional terms may be added which will 
result in a closei approximation at the expense of a greater, if not 
an impossible, amount of computation. For the given formula, 
the solution for r mvolves determining the roots of a fourth-degree 
or quartic equation Either Homer's or Newton's methods, as 
described in college algebra texts, will do the tnck The fourth- 
degree equation will yield satisfactory approxmiations except 
when r is high. 

The solution of a quartic equation is not diflBicult, nor is it so 
easy as to lead to mass production of tetrachoric r's. Fortunately, 
it IS no longer necessary to go through this tedious method for 
getting an approximation to the value of n Diagrams t are 
available which enable one to determine quickly the value of r^ 
for any given table of proportionate frequencies Anyone havmg 
as many as a half-dozen tetrachorics to compute will find it eco- 
nomical to possess a copy of these diagrams. 

t Chesire, L , Safi&r, M , and Thurstone, L L , Computing diagrams for the 
tetrachoric corrdaUon coefficient, Chicago University of Chicago Bookstore, 
1933. 



200 


Other Correlation Methods 


The tetrachoric r is particularly useful in estimating the degree 
of coirelation between variables for which we have only dichoto- 
mized information, but it can also be used instead of biserial r or 
the product moment r, since situations for which these 2 methods 
apply can readily be converted mto fourfold tables by simply 
dichotomizing the graduated variables The advantage of so 
estimating correlation is that tetrachoric r is much easier to deter- 
mine (by using the computing diagrams) than is calculating either 
biserial r or the product moment r Indeed, this fact of computa- 
tional economy has led a number of investigators to use rt when 
product moment r’s could be determined That such a practice 
may be short-sighted economy becomes quite evident when we 
turn to the sampling fluctuation of n 
The standard error of u is closely approximated by 



ZxZy"s/N 




(80) 


When this is compared to the classical formula for the standard 
error of a product moment r, i e., to <rr = (1 — 'r^)/y/N, it will 
be seen that the tetrachoric r has a much larger sampling error 
To illustrate the difference, the sigmas for 4 r^s for 2 different 
dichotomies are presented in Table 21 along with the sigmas (by 
the classical formula) of the corresponding product moment r’s 
iorN = 100. 


Table 21 Sampling Errors of n and r Compared 


r orr* 

P 

P' 


Or 

.00 

.60 

.50 

.157 

100 

.00 

80 

.80 

.204 

.100 

.40 

50 

.50 

.130 

.084 

.40 

.80 

.80 

.182 

.084 

.60 

.50 

.50 

.115 

.064 

.60 

.80 

.80 

.150 

.064 

.80 

.50 

.50 

.073 

.036 

.80 

.80 

.80 

.095 

.036 


It can readily be seen from this table that rt is much less stable 
than r; in fact, even for the most favorable comparison (.50- 50 



Tetrachoric Correlation 


201 


cuts, low r^s), the standard error of the tetrachoric coeflScient is 
more than 50 per cent greater than that for the product moment 
coefficient This means that one must have more than twice as 
many cases to attam the same degree of sampling stability foi a 
tetrachoric as for a product moment correlation coefficient For 
80-20 cuts and low correlations, 4 times as many cases are 
needed to have comparable sampling errors For high correlations 
and also for more extreme cuts, u compares still less favorably 
with r 

The foregoing discussion and further study of formula (80) lead 
to 2 obvious conclusions 

First, the increasing sampling instability of n as the dichotomies 
become more extreme warns us that, unless N is large, one cannot 
place much reliance on Tt for cuts more extreme than .10-90, 
seldom will N be large enough to warrant confidence m a tetra- 
chonc based on cuts more extreme than .05- 95. 

Second, in using rt instead of the product moment r when the 
latter is calculable, one is always throwmg away the equivalent 
of more than half the available data Thus the computational 
economy may be an expensive luxury — it is very doubtful whether 
the calculation of a product moment r for N cases will ever requiie 
anything but a fraction of the expense of secunng data on the addi- 
tional N cases needed to counterbalance the greater samplmg error 
mcurred m using the tetrachoric coefficient. 

As in the case of no r to 2 transformation exists for handling 
the sampling errors of high tetrachoncs For testing the null 
hypothesis, that rt for the universe is zero, we m ay use a simpler 
expression for its standard error, namely, 

Another method for judging the significance of the correlation 
computed from a fourfold table will be presented in the next 
chapter 

The use of tetrachoric r is circumscribed by an assumption that 
the underlymg correlational surface is of the normal type. Among 
other this implies (1) that the dichotomized traits are con- 

tmuous and normally distributed, and (2) that the regressions are 
linear. Although, as discussed m connection with bisenal r, we 
are usually ignorant of the tenability of (1), this ignorance can be 
partially overcome by regardmg the correlation as that which 
would obtam if the traits were normalized, i e , it can be argued 
that the use of tetrachoric r automatically normalizes the distribu- 



202 


Other Correlation Methods 


tions It is not so easy to dispose of assumption (2), since the 
normalizing of vanables will not necessarily lead to linearity of 
1 egression The only consolation here is that measured psycho- 
logical traits are usually Imearly related, if related at all 


FOURFOLD POINT CORRELATION 


If we can safely assume pomt distributions for both dichoto- 
mous variables, a descnptive measure of correlation can be ob- 
tained from a fourfold table (Table 19) by 


BC- AD 

~ V(A + B)(,C + D)(A + 0{B + D) 


( 81 ) 


or from the table of proportionate frequencies (Table 20) by the 
exact equivalent 


^ c - qq' 
Vpqp'^ 


(81a) 


The fourfold point correlation coefficient is frequently referred to 
as the pht coeflBcient and designated by <#> Actually, it is the 
product moment correlation between the 2 variables each scored 
in a pomt fashion (say, 0 and 1) Unlike the point biserial, 
can be umty but only when p == p'. Otherwise (i.e , m nearly all 
situations) Vp and rj from the same table will differ in value, with 
Tp being lower, and the difference between the 2 becomes greater 
as the dichotomy for either variable, or both, varies farther and 
farther from 50-50 

A few examples will illustrate the difference in the magnitude of 
Tp and Tt It IS possible to have a fourfold table with 50-50 and 
50-50 cuts which yields an of 50 and an Vp of .32, and a table 
with 16-84 and 16-84 cuts which yields an of 50 and an Vp of 
only .26 For similar tables (as regards cuts) we may have Vp 
values of 59 and 52 when rt is 80 Thus, Tp is not interpretable 
on the same scale as rt (or r or r^) as a measure (teimmal descrip- 
tive statistic) of the degree of relationship 

However, rp is useful (and necessary) in certain analytical work 
If variable U and variable V were dichotomous and each scored 
as 0 and 1, then rp would be the appropnate value to use m for- 
mula (37), p. 137, to obtam the vanance of W, defined as U + F 
If formula (20), p 59, for the standard error of the difference 



203 


Contingency Coefficient 

between correlated proportions were written analogously to for- 
mula (25&), p. 85, Tp would be used It is also used in the statis- 
tical theory of mental tests 

For testing whether deviates significantly from zero we may 
safely use l/y/N as its standard error when N is not small. 


CONTINGENCY COEFFICIENT 


The contingency coefficient is a measure of the degree of associa- 
tion or correlation which exists between variables for which we 
have only categorical information The number of categories can 
be such as to provide a 2 by 2 table (as for tetrachoric correlation) 
or a 2 by 3, or a 3 by 3, or a 3 by 4, or a 4 by 4, or a fc by Z table. 
This coeflBicient is stated in terms of a quantity known as {cht 
square) thus 


where 



in which 0 is the observed frequency (not percentage) and E is 
the expected frequency for a given cell In a 2 by 3 table there 
would be 6 cells, hence 6 values summed to get The expected 
cell frequencies for the contingency situation are those frequencies 
which would exist if there were no association or relationship 
between the given vanables. It can thus be anticipated that, 
the larger the discrepancy between expected and observed fre- 
quencies relative to the expected, the larger the value of and 
consequently the higher the value of C 
An example will help to clarify the above. Suppose that we 
have 2 variables, each of which yields 3 categones or classifications, 
and that the observed frequencies are as given in Table 22, which 
also contains the expected frequencies m parentheses. (Fictitious 
data; margmal frequencies arranged so as to simplify exposition.) 
In order to ascertain the expected frequencies needed in the coiq- 
putq ^tiQfl-olY^. we ask what cell frequencies would be expected if 
there were no relationship, or zero association, between the 2 
variables. Consider the 100 classified as college; if no association 



204 


Other Correlation Methods 


Table ^2 Contingency Table 



Low 

Medium 

High 

College 

5 

(20) 

46 

(60) 

50 

(20) 

High school 

50 

110 

40 


(40) 

(120) 

(40) 

Grade school 

45 

146 1 

10 


(40) 

(120) 

(40) 


100 

300 

100 


100 

200 

200 

500 


existed, one would expect that these 100 would be distributed 
according to a 1, 3, 1 ratio, i e , in the same ratio as the m arginal 
frequencies at the bottom Thus the expected cell frequencies for 
the top row of cells would be 20, 60, 20 The expected frequencies 
for the middle and bottom rows of cells should also be in a 1, 3, 1 
ratio Both these rows would have expected frequencies of 40, 
120, 40 

It win be noted that (1) the expected frequencies for the columns 
follow, as they should, the ratio of 1, 2, 2, i e., the ratio of 100, 200, 
200 for the marginal frequencies on the right; (2) the expected 
frequencies sum to the same marginal totals as the observed fre- 
quencies; and (3) the expected frequencies actually exhibit a zero 
relationship between the 2 characteristics 

In practice, the c omputation of the expected frequencies can 
i*eadily be accomplishes" ty either of 2 schemes- QQ^express the 
margmal totals along the bottom as proportions of the total JV*, 
then multiply each of the frequencies on the right margin by each 
proportion m turn, enteiing the resulting product in the cell com- 
mon to the 2 marginal figures involved in the multiplication, or 
jK) mul tiply any frecjuency on the bottom margin b y an y fre- 
qmBPcy .on the right margin, and then divi de this product by N , 
the result is the expected frequency for the ceU common to the 2 
marginals mvolved in the products. 

The computation of^x^ is now a routine matter. We simply 
take each cell in turn, square the difference between the observed 
and expected value, and divide by the expected frequency Thus 
we have 



205 


Contingency Coefficient 

(5-20)720 = 11 25 
(45-60)760= 3 75 
(50-20)720 = 4500 
(50-40)740= 2 50 
(110 - 120)^/120 = 83 

(40 - 40)740 = 00 

(45 - 40)740 = 62 

(145- 120)7120 = 5 21 
(10 - 40)740 = 22 50 

The sum of these quantities, 91 66, is To get C, the coefficient 
of contingency, the value of x^ is substituted m formula (82), thus 

I 9166 

C = J 39 

^500 + 91 66 

This strength of association is not to be interpreted as mdicatmg 
the same degree of lelationship as an ordinary (or bisenal or tetra- 
chonc) coefficient of the same magnitude One reason for this is 
that the upper limit for the contmgency coefficient is a function 
of the number of categories The upper limit for a 2 by 2 table is 
■n/J, for a 3 by 3 table, Vf , for a 4 by 4 table, Vf ; for a 5 by 5 
table, V|-; for a, k hy k table, V(fc — l)/k The exact upper 
limits for rectangular tables, such as 2 by 3, 2 by 4, 3 by 4, are 
unknown (As an exercise, the student might demonstrate to his 
own satisfaction the upper limit for 2 by 2 and 3 by 3 tables.) 
The reader will also note that C can never be negative 
Despite having varying maximal values, contingency coeffi- 
cients have a decided advantage over other measures of relation- 
ship, no assumptions involving the nature of the variables need 
be met — continuous or discrete vanables, normal or skewed or 
any shaped distributions for underl 3 rmg traits, ordered or unor- 
dered senes, and combinations thereof are permissible 
Disadvantages are that any 2 contingency coefficients are not 
comparable unless derived from tables of the same size, that they 
are noncomparable to product moment r’s (and estimates thereof) 
unless certain corrections are applied, and that the formula for 
sampling error is unwieldy The necessary corrections and the 
sampling error formula may be found m Kelley, t but before con- 

t Kelley, T L , Stahsttcal method, pp 266-271, New York Macmillan, 1924. 



206 Other Correlation Methods 

suiting Kelley, the reader might bear in mind the following com- 
ments 

•"In regard to the corrections, the first is for number of categories 
The additional correction to make C an estimate of r involves the 
assumption that the underlying traits are continuous and normal 
in distribution Furthermore, this correction is very tedious to 
make It is suggested that, if the assumption of normally dis- 
tributed continuous variables is tenable, one is justified in reducmg 
a contingency table of more than 4 cells to a 2 by 2 table and 
then determining the value of tetrachoric r When reducing to a 
fourfold table, one should combine adjacent categories so as to 
have dichotomies as near to 50- 50 propoitions as possible. The 
combmation should not be made on the basis of the pattern of 
cell frequencies, since this is likely to involve a capitalization or 
decapitalization on chance One might take several or all possible 
fourfold combinations, thus securing seveial tetrachoric r's which 
may then be averaged 

As to the unwieldy sampling error formula for C, it is suggested 
that m so far as one wishes simply to test the null hypothesis, i e , 
that there is no relationship between the 2 given variables, one 
need only enter the value of into an appropriate probability 
table to test its significance If is significant, then the rela- 
tionship IS significantly greater than zero This use of x^ will be 
discussed in the next chapter It should be remarked that, if 
any one (or more) expected cell frequency is small, say less than 
5, the resulting C may be quite erroneous 

Chi square for a fourfold table can be readily obtained by for- 
mula without first computing expected frequencies Thus for a 
set of frequencies like that of Table 19 we have 

^ N(AD - BC)2 

^ “ (A + J5)(C + D){A + C){B -h D) 

This resembles formula (81). In fact, there is a relationship be- 
tween the fourfold point coeflSicient (rp), x^, and C: 

Other measures of association or of correlation between attri- 
butes have been advocated This is not the place to argue the pros 



The Correlation Ratio or r\ (eta) 207 

and cons of these other measures It seems to the author that the 
measures we have discussed are the more defensible. 

THE CORRELATION RATIO OR t| (ETA) 

It Will be recalled that one way of understanding the product 
moment correlation coefficient is to note from the relationship, 
= 1 — x/<^y (or = 1 — a^x that the degree of cor- 

relation is a function of the error of estimate variance relative to 
the total variance of the vanable bemg predicted by a Imear regres- 
sion line If the Bxra. j means fad to fall on a straight line, it can 
rightly be argued that better_gr^iction can be made by using a 
curve which really *ffits^^ the means or by using the means them- 
selves The latter procedure would entail an error of estimate 
which would be a function of the variance withm the arrays about 
the array means An over-all variance about the means of the 
vertical arrays can be calculate d bv sailin g the deviations abou t 
the megn of eac h arra y, summing theg e for all arrays , and then 
dividmg b;^jy'. The resulting variance for the vertical arrays may 
be labele3[V^, for the ho rizontal ar rays, 

The correlation ratw, ^ uT terms of t^ie accuracy with which 
Y*s can^be predicted from X’s is defined as 

(84) 

and for X’s predicted from Y’a, we have 

= 1 - ^ (84a) 

Are two ij’s necessary^ We have not proved herein that the vari- 
ance about the mean is smaller than about any other point, but 
this fact IS readily deducible from the computational formula for 
<r m terms of deviations from an arbitrary ongin. If AO coin- 
cides with the mean, will equal if AO does not coincide 
with the mean, a subtractive term will always be mvolved. It 
follows that (Tay Will be less than <Ty x and that (To® will be less than 
(Tx yy hence both will exce^ r, but to varying degrees, depend- 
ing upon the extent to which the array means fail to fall on a 



208 


Other Correlation Methods 


straight line Since it is possible, and likely, that the means for 
the vertical arrays will not exhibit the same departure from lin- 
erarity as those for the horizontal arrays, it is not reasonable to 
expect the two ^'s to agree 

The 17 's mdicate the relative accuracy with which one can pre- 
dict on the basis of array means, and accordingly they are useful 
measures of the extent of correlation when the regressions are 
curvilmear The correlation ratio can also be utilized when the 
regression is linear; hence it is more geneially applicable than the 
product moment coefficient, which is useful only in the special 
case where the assumption of linearity is tenable The correlation 
ratio, however, does not enter into the regression equation con- 
stants. 

Even if the regressions were exactly linear for some defined 
population, a given sample would show deviations from lineanty, 
and therefore 17 's for successive samples would show chance sam- 
pling deviations from r. By how much must vj exceed r before 
one suspects curvilineanty^ The only adequate statistical test 
for answering this question involves the analysis of variance 
techmque and hence is postponed to Chapter 16 

Another definition of ?? can be had by starting with the proposi- 
tion that the variance a^y can be broken down into components, a 
predictable and an unpredictable part, or a^y = -f o^ayt in 
which is the vanance of the array means weighted for the 
number of cases in the several arrays. Then we have 17 defined as 
= ^mj^y and also as 77 ^ 3 , = <^mj^x These are analogous 
to = t^yf/tr^y and and accordingly we may inter- 

pret t^yyi as the proportion of Y variance explained by or associated 
with variation m X. 

Since the 17 's are most readily computed by methods to be 
developed later (pp 272-274), no illustration will be given here 

RANK CORRELATION 

When no measuring instrument is available for a trait, resort is 
frequently made to rank-ordenng by judges One measure of 
relationship between variables for which we have individuals 
ranked is given by p (rho), the Spearman rank-cbfiference correlation 
coefficient. 

esD^ 

~ ^ ~ iV(Ar2 - 1) 


(85) 



Rank Correlation 


209 


in whicli D is the difference between an individual’s 2 ranks (for the 

2 traits) When we have ranks for one variable and scores for the 
other we can use the scores as a basis for setting up ranks for the 
latter, and then compute rho 

Whenever rankings on a given variable involve ties (the judges 
fail to distmguish between 2 or moie mdividuals or the scores used 
for rankmg are such that 2 or more persons have the same score), 
the lanks are split between mdividuals who are m tie positions. 
Suppose 3 ranks have been assigned and that 2 individuals are 
tied for the fourth position If they were distinguishable, they 
would use up ranks 4 and 5, so we assign each a value of 4.5 Had 

3 persons tied for this position, we would split ranks 4, 5, and 6, 
giving each a rank of 5 Then when we proceed to the remaining 
mdividuals we must remember that rank position 6 has been used 

The computation of rho is illustrated in Table 23 The fact that 

Table 2S Cohpxttatiois[ of Raitei-Diffebenge Cobbelation Coefficient, 

Rho 


Ranks Differences 


arsons 

Ist 

2nd 

D 


A 

3 

1 

2 

4 

B 

4 

2 

2 

4 

C 

10 

10 

0 

0 

D 

8 

4.5 

3.5 

12.25 

E 

5 

6 

-1 

1 

F 

9 

11 

-2 

4 

G 

1 

3 

~2 

4 

H 

2 

7 

-5 

25 

I 

13 

13 

0 

0 

J 

11 

4 5 

6 5 

42 25 

K 

7 

8 5 

-1 5 

2 25 

L 

6 

8 5 

-2 5 

6 25 

M 

12 

12 

0 

0 


0 105 00 = SD* 


6(105) 
13(169 - 1) 


71 


the algebraic sum of the D’s must be zero can be utilized as a means 
of checkmg the D-column values 
Rho for ranks based on scores for 2 normally distributed van- 
ables tends to be dightly (less than .02) lower than the product 



210 


Other Correlation Methods 


moment r computed from the scores; hence rho is comparable with 
7* as a measure of the strength of relationship 
•'To test the significance of rho, for N of 10 or more, we may safely 
use 



which approximates the t distribution with iV — 2 degrees of 
freedom. 

Rho does not possess the mathematical advantages inherent 
in r, and therefore has ment only when the observations on one or 
both variables are ranks mstead of measures Because of judg- 
mental difficulties in assigning ranks for N large, rank-order data 
are apt to be confined to small samples, but for N less than 10 the t 
test of the sigmficance of rho is not satisfactory Kendall § has pro- 
posed another measure, designated r (tau), for use with ranks which 
IS superior to rho in so far as testing significance is concerned 
when N is very small As a measure of the degree of relationship, 
tau, like rho, has the property of being unity for a perfect relation- 
ship, for zero and near zero correlation these 2 measures tend to be 
alike numerically, but for other degrees of association tau tends to 
be lower than rho — ^at times only two-thirds the magnitude of rho. 
Thus tau is not comparable with rho (and r), and furthermore 
there seems to be no specifiable way of estimating one from the 
other. For a much more adequate discussion of both tau and rho, 
the reader is referred to Kendall 

THE DISCRIMINANT FUNCTION 

Suppose we have 2 or more variables (measured in a graduated 
fashion) which we wish to combine into a total score for the purpose 
of discriminating between 2 groups. The question arises as to how 
best weight the variables so as to obtain maximum difference be- 
tween the total score means for the 2 groups This difference must 
be considered relative to the withm-groups variability; otherwise we 
could easily produce a large numerical difference by the simple 
operation of summing the scores and multiplying by a large con- 
stant, whereas the real purpose is to have score distributions with 

§ Kendall, M G , Rank corrdation methods^ London GrijB&n, 1948 



The Discriminant Function 


211 


the least amount of overlap for the 2 groups We want the dif- 
ference to be maximal relative to the spread of scores withm the 
groups. 

The simplest way to determine the weights for the several 
vanables is to compute the jS^s, thence the J5’s, as in the multiple 
regression problem For this purpose, the product moment cor- 
relations among the 2 or more independent variables are calcu- 
lated, and the point biserial r is calculated between each inde- 
pendent vanable and Xi, the dependent vanable (membership 
in one or the other of the 2 groups, with one of the groups con- 
sistently designated as corresponding to the first category for the 
biserial setup) 

Actually, since the problem here is that of ascertaimng optimum 
relative weights rather than fitting a regression plane, the A of the 
regression equation need not be calculated nor need we worry about 
cri (= \/piP 2 of the biserial setup) The weights may be taken 
simply as ^ 2 /(^ 2 , iSa/o’a, etc , all multiplied by a constant so chosen 
as to have weights which exceed, say, 10 — ^thereby avoiding deci- 
mals Some of the weights may be negative, accordmg to the sign 
of the correspondmg jS If aU or a majonty of the weights are 
negative, the signs of all may be reversed The relationship 
of the total of the optimally weighted scores to group membership 
IS descnbable by the multiple r computed by equation (69a) Such 
a multiple r is the point biserial between the total weighted scores 
and belonging to one or the other of the 2 groups Or one may 
compute the weighted scores foi aU N cases and then make dis- 
tributions for the 2 groups separately m order to scrutinize the 
amoimt of differentiation (or overlap) present. 



CHAPTER 


Frequency Comparison: Chi Square 


The quantity chi square (x^), defined in the last chapter as 




E 


(83) 


or as the sum of the squared discrepancies, between observed and 
expected frequencies, each divided by the expected frequency, is a 
statistic which is very useful m a variety of problems involvmg 
frequencies. Let us begm by an examination of what migjit be 
expected to happen if a penny were tossed 100 times The ex- 
pected frequency for heads is 60, and for tails is also 50. If for a 
particular series of tosses we secured 55 heads and 46 tails, the 
discrepancies would be +5 and —5. When these discrepancies 
are squared, each becomes +26, and dividing each squared dis- 
crepancy by the expected value we would have .6 + .5 == 1.0 as 
the value for x^. Had we obtained 40 heads and 60 tails, the 
discrepancies of —10 and +10, when squared and divided by E, 
would give 2 + 2 = 4 as 

Three things are readily apparent from the above: first, the 
greater the discrepancy relative to jB, the greater the contribution 
to x^; ^second, the two parts being summed to obtain x^ are not 
independmt — ^when the absolute discrepancy for heads is known, 
that for tails can be inferred to be the same; and third, the squaring 
process means that x^ is always a positive quantity regardless of 
the direction of the discrepancies.* A fourth fact becomes appar- 
ent if one recalls what happens when a series of tosses is repeated 
The number of heads (or tails) secured will vary from one series 
of 100 tosses to the next; hence the amount of discrepancy will 
vary, and therefore the magnitude of y? will vary from series to 
series^ ^ In other words, successive samphng will yield varying 

212 



213 


Chi Square and the Binomial Distribution 

values for If knew the sampling distribution for we 
could specify the probability oi securmg by chance as large a value 
as any obtained x^, and thereby we could judge whether a given 
amount of discrepancy is sigmficantly large enough to warrant 
the conclusion that the coin is biased 
Situations similar to this arise m research work We may, on 
the basis of a hypothesis that a certain proportion of individuals 
possess a given characteristic, state how many of a sample of N 
cases would be expected to show the characteristic Observations 
on N cases will provide an obser\^ed number If the hypothesis 
is tenable, the discrepancy between observed and expected should 
be no larger than might anse on the basis of chance If the ob- 
tained discrepancy is too large, i e , not apt to aiise by chance, 
the hypothesis becomes suspect The student who recalls that 
the standard error of a proportion can be used m comparing ob- 
served with expected proportions may wonder whether another 
technique is necessary The answer will be forthcommg. 

CHI SQUARE AND THE BINOMIAL DISTRIBUTION 

Perhaps some insight regarding the sampling distiibution of x^ 
can be obtained by a re-examination of the binomial distribution, 
which was discussed in Chapter 5 Suppose we consider the bi- 
nomial distribution, (p + with p = q = 1/2, as yielding the 
chance distnbution of number of heads when 10 unbiased coins are 
tossed (see Table 24) When 10 conus are tossed we expect to get 

Table ^4 The Binomiajl and x® when 10 Coins are Tossed 


Number of 
Heads 

Jb 


/ for X® 

10 

1 

10 0 

2 

9 

10 

6.4 

20 

8 

45 

3 6 

90 

7 

120 

1 6 

240 

6 

210 

0 4 

420 

5 

252 

0.0 

252 

4 

210 

0 4 


3 

120 

1 6 


2 

45 

3.6 


1 

10 

6.4 


0 

1 

10.0 



1024 


1024 



214 Frequency Comparison: Chi Square 

5 heads and 6 tails, that is, the are 5 and 5, but for a particular 
toss we will have an observed number of heads (and tails) which 
may differ from 5 and 5 The observed values, or O's, could be 10 
heads and zero tails, 9 heads, 1 tail , and so on to zero heads, 10 tails 
If we obtained 9 heads and 1 tail, we could write == (9 — 5)^/5 
+ (1 — 5)^/5 = 64 Similarly, if we compute for 10 heads 

and no tails we get a value of 10 0, for 8 heads and 2 tails we get 
3 6, etc Note that for each x^, 'SE = 7^0 = 10. 

The third column of Table 24 gives the values of x^ for various 
possible sets of observed frequencies for number of heads and tails 
All the given numerical values of x^> except 0, appear twice* 9 
tails and 1 head will obviously lead to the same x^ as 9 heads and 1 
tail Now the probability of obtaining 9 heads and 1 tail is 10/1024 
and the P for 1 head and 9 tails is also 10/1024; hence the P for 
obtaming a x^ of 6 4 is 20/1024 Likewise, we may combine the 
appropiiate binomially derived chance frequencies (fb) so as to 
write the chance frequencies for the several x^ values These 
appear as the fourth column of the table We have thus established 
the chance or probability distribution of x^ for a specified com 
tossing situation A plot of these frequencies against the x^ values 
will reveal a highly skewed distribution 

The probability of a x^ as large as 6.4 will be 20/1024 + 2/1024, 
or 22/1024, a value which obviously represents the probability of a 
discrepancy, between 0 and E, as great as 4 in either direction 
(at least 9 heads or at least 9 tails) The P of 22/1024 involves 1 
tail of the distribution of x^ values, but both tails of the binomial 
contribute thereto This fact will need to be recalled below when 
we discuss one- vs two-tailed tests of hypotheses. 

Before we leave Table 24, it might be well to point out a con- 
nection between x^ and x/<r. Consider again an obtained frequency 
of 9 heads If we express 9 as a deviation from th e me an of the 
binomial, np = 5, relative to the o- of the binomial, y/npq = 1 581, 
we have 4/1.681, which when squared gives 6 401 or the coriespond- 
ing value of x^ (within limits of rounding error) This agreement 
is not accidental , as will be seen shortly, under specifiable conditions 
= {CKf, Another charactenstic of x^ is obvious 
from Table 24* for the 10 coin situation no values of x^ other than 
those given can be obtained because the possible number of heads 
(and tails) is a discrete senes. This lack of continuity imposes a 
restriction on the use of x^ which will receive more attention as we 
proceeSr 



215 


Chi Square and the Binomial Distribution 

The values in Table 24 are for possible discrepancies of ob- 
served frequencies from an expected frequency of 6 for a single 
toss of 10 coins Suppose that we have, as shown m Table 25, an 

T(Me ZB FOB DiscKEiFANCiiis OF Expected and Obsebyed Ebeouencies 
WHEN 7 CioiNS Webe Tossed 1000 Times 


Number of 
Heads 

E 

0 

1 

O 

(P-M)' 

E 

7 

8 

4 

-4 

2 00 

6 

55 

55 

0 

00 

5 

164 

157 

-7 

30 

4 

273 

283 

10 

37 

3 

273 

267 

-6 

13 

2 

164 

177 

13 

1 03 

1 

55 

45 

-10 

1 82 

0 

8 

12 

4 

2 00 

Sums 

1000 

1000 

0 

7 65 


(N) 

(N) 


(x*) 


observed distnbution of frequencies obtained by tossing 7 corns 
1000 times, and that we wish to compare these observed frequencies 
with those axpected on the basis of the bmomial expansion We 
are not concerned this time with a single toss for which the ex- 
pectation would be 3 5, but rather with the results expected when 
a large number of tosses are made Note that both the E column 
and the 0 column sum to 1000 (or N) and that the (0 — ^’s sum 
to zero The several contnbutions to x^ are given m the last 
column, which sums to 7.65, or the x* for the entire table Two 
other series of 1000 tosses made by students in the author's classes 
yielded x* values of 12 52 and 15 02 Two of these values for x® are 
larger than any of the values in Table 24, and one reason for this 
IS the fact that more (0 — E)^/E terms are bemg summed — 8 such 
values instead of 2 Thus, the possible^magnitude of a Fould 
seem to be a function of 2 things: the size of the squared discrep- 
ancies (relative to their respective S’s)>and the numljfir.of joate- 
gories or_possib]hties for discrepancy Actually, the chance or 
sampling" distribution of x^ is only indirectly a function of the 
number of discrepancies; it is a digjjt function of the number of 
independepj, discxepancies or the degrees of freedom, which we diall 
next discuss. 



216 


Frequency Comparison: Chi Square 


DEGREES OF FREEDOM 

We have seen that the of 6 4 m Table 24 involves two (0 — 
S)V^ values (9 — 6)^5 and (1 — 5)^/5, or 2 discrepancies of 
exactly the same absolute magnitude This means that the 2 
discrepancies are not independent — as soon as one is calculated, the 
other can be wntten down at once without any further calculation, 
hence 1 degree of freedom exists If we study the data of Table 25, 
we see that, since the discrepancies must sum to zero, all 8 cannot 
be mdependent or vary freely. As soon as 7 are known, the eighth 
IS determined This means that there are 7 degrees of freedom for 
this situation If we weie to roll a die 600 times and then compare 
the observed frequency for 6 spots, 5 spots, etc , with the number 
expected on the basis of a perfectly homogeneous (unloaded) cube, 
we would have 5 possible independent discrepancies, or 5 degrees 
of freedom In each of these situations the expected frequencies 
are determinable on the basis of some a pnori piinciple, and the 
only restnction is that the total expected frequency must be the 
same as the total observed frequency, i.e , Ne must equal Nq, In 
all such cases the number of degrees of freedom (df) is 1 less than 
the number of categories. 

The df for other situations in which the x^ technique is applica- 
ble will follow the same pnnciples as to the number of independent 
discrepancies, but not the rule just laid down Suppose we con- 
sider a 2 by 2 or fourfold table such as that given m Table 26 

Table 26 x® and Fourfold Table 


(E3q)ected frequencies m parentheses) 



No 

Yes 

Totals 

Group 1 

50 (40) 

50 ( 60) 

100 * Ni 

Group 2 

70 (80) 

130 (120) 

200 = N2 

Totals 

120 

180 

§1 

n 



Ny 



(which contams fictitious data for purpose of ease in exposition) 
The expected frequencies are set up on the assumption that there 
is no difference between the 2 groups (the null hypothesis) If 
this were the case, we would expect that the 180 yeses would be 
distributed in the 1 to 2 ratio of the nght-hand totals, likewise 



Sampling Distribution of 


217 


the 120 noes. Note that the expected frequencies reading across, 

1 e , 40 and 60, and 80 and 120, are proportional to the Tna T* g;mf >.1 
totals at the bottom In determining the df, we can observe eith'^r 
of 2 thmgs. first, that all 4 discrepancies have the same absolute 
value, so that when 1 is known the other 3 can be written down 
at once; or second, that in setting up the expected frequencies, we 
are restncted by the requirement that the 2 top-row values must 
sum to Ni, the next 2 must sum across to N 2 i the left>-hand column 
must sum to Nm and the next column to Ny, as soon as the value 
40 has been ascertained, the remaining 3 expected values become 
fixed Either way we look at the situation, we see that there is 
but 1 degree of freedom even though there are 4 cells or 4 dis- 
crepancies 

The fundamental question is. How many of the discrepancies 
are mdependent^ In practice this can be answered by determining 
how many categories or cells can be filled in at will before the 
others become fixed because of the restrictions imposed. If we 
turn back to Table 22 of the last chapter (p 204), we see that the 
restrictions for a 3 by 3 table are similar to those for a 2 by 2 
table, the expected frequencies must add across and down to the 
observed marginal totals The student should ponder Table 22 
long enough to see that the prop er df is 4. The general rule-of- 
thumb for ascertaining the degrees offreedom for all contingency- 
type tables of h rows and I columns, where the marginal totals 
are utilized m setting up the expected frequencies, is to take 
df = (jc — 1)(Z — 1). Thus for the fourfold table we have 
(2 — 1)(2 — 1) =J^ and for the 3 b y 3 table, (3 — 1)(3 — 1) = ^ 
etc. Such tables need not be sq^3fe/W Sacx, very often the psy- 
chologist wishes to compare 2 groups on the basis of k possible 
responses to a question. For this fc by 2 table, the df becomes 
(fc — 1)(2 — 1), or simply fc — 1. 

SAMPLING DISTRIBUTION OF x® 

Before discussing further the applications of x ^7 we turn again 
to the sampling distnbution of this statistic. It is easy enough to 
see from the coin tossing situations which we have considered 
above that chance leads to discrepancies between observed and 
expected frequencies. In those situations wherem we wish to 
compare groups, we know from the discussion of samplmg in 



218 


Frequency Comparison: Chi Square 


Chapter 5 that differences in responses or characteristics can and 
will anse as a result of chance sampling even though the 2 uni- 
verses do not differ. Likewise, contingency tables involving the 
possible relationship between 2 categorized vanables will yield 
varying chance values of even though no real association exists 
Knowing the chance samplmg distribution of for vanous degrees 
of freedom, we can specify the probability of obtammg a x^ as 
large as any value and conclude therefrom, according to the 
situation, that observations do not agree with hypothesized fre- 
quencies or that 2 or more groups differ significantly or that a 
real association exists 

We have already suggested that, for 1 degree of freedom, the 
distnbution of x^ is the same as for {x/a)^ The general equation 
for the x^ distribution * mvolves an n or the d/, and therefore 
there is no one x^ distnbution but a very large number of distri- 
butions, one for each value of n. It happens that practical work 
seldom involves more than 30 degrees of freedom, so that we need 
not concern ourselves with all possible distnbutions Curves for 
the distnbution of x^ can be drawn for vanous n's with x^ along 
the abscissa and the ordinates as the y values obtained by the 
equation in the footnote The area under each curve will be 1 
umt, as m the unit normal curve. Figure 14 contains curves for 
7 different values of n or df, so drawn as to be comparable. Note 
that the shapes of these curves and their general locations along 
the abscissa vary with n. 

For n == 1, or for 1 degree of freedom, the curve starts very 
high (strictly speaking, it is asymptotic to the ordinate and hence 
starts at infinity) and drops quite rapidly. For this curve the 
height or y value at x^ = 16 is 92 (not shown) At x^ = 01, 
the height is more than 4 times greater than .92. By the time 
we reach a x^ of 1 00, the height is 242 (what x/c value does this 
height correspond to when the umt normal curve is considered?) 
Then the curve trails off until, at x^ = 6 25, the hei^t is about 



in which T indicates the gamma function as defined in texts in advanced 
calculus. 



Samjdinig Distribution of yj 


219 





Ftg. 14 Chi square diatnbutions for vanous Cff’a along abscissa 


.007 Regardless of n, the right-hand parts of the curves never 
reach the base Ime, i.e , they are asymptotic If we think of the 
total area under any curve as umty, then the area between ordi- 
nates erected at any 2 base-lme points, or the area beyond any 
pomt, can be expressed as a proportion of the total. Thus, for 
» = 1, .99 of the area is beyond (to the ri^t oO a x® value of 
.000157, and only 05 is beyond 3.841 Stated differently, the 





220 Frequency Comparison: Chi Square 

probability of obtaining a value as large as 3 841 is .05; for x^ 
as large as 6.636, P = 01, and the P = 001 point is at a x^ of 
10 827 These hold only for d/ = 1 

The curve f orn = 2 starts at a height of 50 and then descends, 
but less rapidly than that forn = 1 It is readily seen that large 
values for x^ occur more frequently when w = 2 than when n = 1 
The P = 05 point is at 5 991 ; i e , the probability of obtaining by 
chance a x^ value as great as 5 991 is 05 The 01 point is at 9 210, 
and the 001 point is at 13 815 

For n = 3, the distnbution curve begins at zero height, rises 
sharply to a maximum (modal value) at x^ = 8,nd then falls off 
so that the P = 01 point is at x^ = 11 341 As n is taken larger 
and larger, the distributions become less and less skewed and move 
farther and farther to the nght The mean of a given distribution 
always corresponds to a x^ equal to n, and except for n = 1 the 
modal value is at a x^ of w — 2 

The distributions of x^ for varying n^s are theoretical probability 
distnbutions They may be mterpieted as random sampling dis- 
tributions, and by them one can judge the statistical significance 
of discrepancies Their use is exactly analogous to testing the 
significance of the difference between means, which it will be re- 
called involves setting up the null hypothesis if there is no real 
difference between 2 universe means, the D/(td values for suc- 
cessive samples will form a normal curve with center at zero and 
with unit variance If a found difference is 1 96 times its standard 
error, the null hypothesis becomes suspect, if 2 _5^times its stand- 
ard error, the hypothesis of no difference can fairly safely be re- 
if D/tTD = 3 00, rejection is more definitely indicated 
‘■'These 3 CP’s, it will be recalled, coirespond to &e 05, the .01, and 
the 003 levels of sigmficance, for two-taile3 teste 

Now x^ can likewise be used to test the null hypothesis The 
essential difference between the D/<td and the x^ techmques is 
that the latter involves skewed probability distributions, but, 
knowing the distnbution for a given n, one can ascertam the 
necessary value of x^ for the 05, the 01, the 001, or other levels 
of significance The statement of the null hypothesis in connec- 
tion with x^ may vary slightly according to the given situation 
If the frequencies in the universe agree with the a priori expected 
frequencies, if the frequencies m 2 or more universes are the 
same, if there is zero association m the umverse between 2 classi- 



221 


Sampling Distribution of 

fications or variables — any such conditions hold for the universe 
or universes, then successive samplings will yield values which 
will distnbute themselves in a determinable manner, thus per- 
mitting one to specify the probability of obtaining by chance a 
value as large as any given or obtained value. When this 
probability is small, say 01 or less, the null hypothesis is rejected, 
and its rejection implies that there are real discrepancies or real 
differences exist or there is a real association 
Since the random sampling distnbution of x^ depends upon the 
dfj which vanes from situation to situation, it is not feasible to 
give a rule-of-thumb ciitenon in terms of the magmtude of x^ 
which would be deemed significant If we adopt P = 01 as the 
level of significance we wish to attam, then we need to refer to 
available tables of x^ ibl order to find how large x^ must be to 
correspond to this level, likewise for any othei chosen level of 
significance Probability tables for x^ are available in 2 forms. 
One form, Fishei’s (see Table D of the AppendLx), gives the values 
of x^ which will be exceeded by chance a specified number of times, 
such as .10, 05, .01, and 001 Elderton’s table,t gives the prob- 
abilities for obtaining chi squares as large as specified values ex- 
pressed as mtegers, such as 1, 2, 3 • • •, 21, 22 Both tables mclude 
varying degrees of freedom Because of an early erroneous notion 
as to the meaning of degrees of freedom, Elderton^s table must be 
entered with df equal to 1 less than his n' values, e g , use n' = 4 
when n or d/ == 3. Elderton's table has one advantage over that 
given m our Appendix P values as small as 000001 can be ai^ 

certained. 

For n^s larger than 30 , the expre ssion V2^ — V27i — 1 will 
have a sampling distribution whi ch follow very closely the 
nriit normal curve T he probabiEi^ is accordmgly 05 that this 
expression will exceed +1 64, and" ex ceed +2,33, 

By^ chanced ” 

BetoreThe possible applications of x^ are summarized, a wora 
should be said about the underlymg assumptions which restrict 
its usage The probability figures in the tables of x^ are based on 
contmuous distributions, whereas, as pointed out earlier, the chi 
squares calculated m practice form a discrete senes. It is assumed 

t Table XII in Pearson, Karl, Tables for stattsticiam and biometriciana, 
part J, Cambridge Cambridge Umversity Press, 1931 




222 Frequency Comparison: Chi Square 

that the distnbution of the latter can be approximated by the 
former This is similar to approximating the binomial by the 
normal curve A second assumption is that the sampling dis- 
tribution of the observed frequencies about a given E follows 
the normal curve One can seldom, if ever, check on the tenability 
of this assumption, but it is possible to specify conditions where 
the assumption wOl not hold. If any one E is small, it is not 
possible to have a normal distnbution of O's about it even though 
the total N is large. For instance, if S = 2, the O's are restricted 
on one side of E to zero and 1, whereas on the other side the pos- 
sible values run 3, 4, 5, and upward Such a curtailment ordi- 
nanly leads to a skewed distnbution for the observed frequencies 
Now it is obvious that, when E is small, we have a greater amount 
of discontmuity, hence the sampling distnbution of observed fre- 
quencies will be discrete instead of continuous as called for by the 
normal curve. It would seem, therefore, that small expected fre- 
quencies lead to a violation of both the fundamental assumptions 
imderlymg the use of Vanous criteria have been proposed for 
the required size of E, Some say that the technique is inappli- 
cable when any one E is less than 10, others say that an E may be 
as small as 5 We would suggest that, when possible, adjacent 
categones be combmed so as to have no E less than 10; if such a 
combination is impossible and an E is less than 10 but greater 
than 5, may be used, providing one is cautious as to the con- 
clusions drawn therefrom A correction for continuity when 
djT is^l, as in a fourfold table, is"*avairable and wiU be given 
later. 

A third a ssumpti on is that the observations be independent of 
one anothei\ This assumption is violated when the total of the 
oBservedIrequencies exceeds the total number of persons in the 
sample(s) Such an inflation of N occurs when multiple observa- 
tions are made on each person and each person is counted more 
than once (cf. p. 99). 


APPLICATIONS 

The chief jsituations for which it is permissible to use x^ may be 
classified into 

l,,The discrepancy of observed frequencies from frequencies 
e3q)^ited,pn the basis of some a priori principle. Such situations 



Applications 223 

are most frequently foimd in genetics, wherein it is hypothesized 
that ceitam crossings should lead to the presence, in a certain 
proportion of offspnng, of some defined characteristic or variation 
thereof The frequency table for such situations is 1 by fc, with 
k — I degrees of freedom, since the only restriction is that the 
expected frequencies must sum to N, This type of situation does 
not arise often in research m the social sciences. 

2 Contingency tables. Here we have 2 t3q)es of situations 
which differ only in the methods of classifying 

a We may have a contingency table which is analogous to a cor- 
relation table m that both classifications are based on continuous 
or ordered discrete variables for which we have only categorized 
information for N individuals The 2 variables might be in 
dichotomy (fourfold table), or one might be a dichotomy and the 
other manifold, or both might involve multiple categories. For 
these contingency tables it is meaningful to speak of the correla- 
tion between the 2 variables, and the degree of correlation might 
be appropriately specified by the tetrachoric r or the fourfold 
point r or the contingency coefficient (corrected or uncorrected) ; 
which measure is used depends upon meeting the requisite assump- 
tions In so far as we are concerned only with we have the 
means for testing the significance of the correlation or association 
as a chance departure from zero or no relationship, and the signifi- 
cance test can be used without knowledge of the degree of correla- 
tion. Such a test of sigmficance is sometimes spoken of as a test 
oT independence — are the 2 classifications mdependenfS^ If so, 
X^ should be no larger than would anse by chance. If we have 
evidence for correlation or a lack of mdependence from the x^ 
technique, we can proceed to calculate an appropriate coefficient 
for measurmg the degree of correlation or the strength of associa- 
tion The student should, as an exercise, convmce himself that 
sa is, not a measure of association. 
b The other contmgency-type situation involves classification 
mto categories for one vanable vs. classification into unordered 
groups for the other, or one unordered grouping vs. another. The 
fundamental problem is apt to be that of comparing 2 or more 
groups with regard to multiple responses; i e , we want a test of 
the difference .between gmims rather than a measure of correla- 
tion, whTcli would not be entirely meanmgful except in the loose 
sense that a particul^ response is associated more often with a 



224 


Frequency Comparison: Chi Square 

particular group As previously stated, the df for a fc by Z con- 
tingency table is (fc — 1)(Z — 1) 

•'S Goodness of fit If we wish to check on whether it is reason- 
able to believe that a given frequency distribution is, within the 
limits of chance sampling, of the normal or some other specified 
type, a frequency curve having the same basic constants (eg, 
N, M, and a for the normal curve) as those computed from the 
observed frequency distribution can be fitted to the data. If a 
normal curve is bemg fitted, the table of normal curve functions 
is used to set up the theoretical or expected frequencies for the 
several grouping mtervals. Then can be computed in the usual 
manner The df will correspond to the number (fc) of grouping 
intervals less the number of constants deiived from the data and 
used m the fitting process For the normal curve the observed 
and theoretical distnbutiors are made to agree as to N, M, and 
<r] hence df = k Z An attempt will be made later to explain 
the reasoning back of the determination of df when checking the 
goodness of fit of frequency curves 
Fourfold contingency tables. For illustrative purposes, let 
us first apply to a couple of 2 by 2 contingency tables for which 
the tetrachoric r, as well as the contingency coefficient, is an 
appropriate measure of the degree of correlation. Before we do 
this, it might be well to recall that x^ for a fourfold table can be 
computed by a simple formula which does not require calculation 
of the 4 expected frequencies Let the fourfold frequencies and 
marginal totals be set up as in Table 27 Chi square can be com- 


TaUe 27. Setup pob Computing x® from a Fourfold Table by Means 

OP A Formula 


puted from 




A +B 
C+D 
N 

N(AD - BC)2 

(A + B)(C + DXA + 0(5 + D) 


A 

B 

C 

D 


A-fC B+2) 


( 86 ) 


This IS simpler than calculation from the discrepancies between 
observed and expected frequencies. T he requisit e that no eocpected 




225 


Fourfold Contingency Tables 

frequency shall be less than 5 still holds. A quick check on this 
can be obtained by multiplying the smaller right-hand margmal 
frequency by the smaller frequency on the bottom margm an(l 
dividing the product by N, This will yield the smallest expected 
frequency In Table 28 will be found 2 fourfold tables for Stanf ord- 

Tahle 28 x® Applied to Contingency (Fourfold) Tables 
Item 1 Item 3 


29 

+ 

39 

68 

> 

1 

Ik 

+ 

1 

^ ' 

4- 

37 

71 


Ufa 

1 32 

XbCiLU T 

94 

35 

129 

61 

49 

100 

128 

72 

200 

;^ = 5 93 

P about 01 


X® - 12 40 

P less thaa 001 



Bmet items Direct substitution into formula (86) yields the 
2 chi squares at the bottom of the table. The P values are ap- 
proximately .01 and less than .001, respectively. We can be 
reasonably sure that there is some correlation between the first 2 
items, and fairly certain that items 3 and 4 are correlated The 
value of the tetrachoric r is 40 for each table, and the contingency 
coefficient (with no corrections) is .24 for each table Thus we see 
that the P's associated with the same degree of correlation can 
be different Why? Would it be possible for 2 fourfold tables to 
yield the same x^ Pj yet differ m the degree of relationship? 

Another application of x^ to fourfold tables is given m Table 29, 


TahU 29 Used to Test Sex Differences in Passing (+) or Failing 
(— ) A Binbt Item 


Age 


X* 

P 


4.30 

<.05 


5.89 

<.02 



- 

+ 


- 

+ 

— 

+ 

B 

84 

18 

102 

66 

36 102 

68 

44 

G 

93 

8 

101 


^ 100 

62 

39 


177 

26 

203 

146 

56 202 

120 

83 


102 

101 


.43 

<.60 


9 


103 
101 

89 115 204 


- + 


37 

66 

52 

49 


5.02 

<.05 




226 Frequency Comparison: Chi Square 

in which the sexes at 4 age levels are compared in performance 
on a Stanford-Binet item. None of the values reaches 6 635, 
tjie value correspondmg to the .01 level of significance, but 3 of 
them are large enough to suggest a real sex difference That a 
real difference may exist is also suggested by the fact that the boys 
are consistently superior at all 4 age levels. This brings us to an 
important property of x^ The several chi squares for independent 
(i e , based on different samples) tables may be summed to a total 
X^, with df equal to the sum of the d/s for the chi squares bemg 
summed. Thus for Table 29 we have 4 30 + 5.89 + .43 + 5 02 = 
15 64 as a x^ based on 4 degrees of freedom, by which we can judge 
the significance of the over-all sex differences shown in the 4 tables. 
With x^ = 15 64 and n = 4, we find (from Table D) that P is less 
than .01 (for n = 4, a x^ of 13 28 coiresponds to the 01 level) If 
one turns to Elderton’s tables, it can be ascertained thgjt P is 
about 004 In other words, as great a sex difference, considenng 
all 4 age groups, would arise 4 times in 1000 by chance, hence it 
would be concluded that a real difference does exist for this 
item. 

This combinatonal property of x^ is important for all situations 
where frequency data from different groups cannot first be legiti- 
mately combmed because of age or other differences It is most 
useful when consistency is present among severa^cpjppansons, 
none of which taken singly possesses statistical significance. How- 
ever, neither consistency nor insignificance for smgle comparisons 
constitutes a requisite for using the sum of chi squares as an over- 
all test of significance or as a means of arriving at 1 summary 
probability figure. 

The smgle age comparisons in the above example could, of 
course, be made by means of proportions. This could be done 
by formula (21) of Chapter 5, the discussion of which (pp 60-61) 
should be reviewed at this time Let us examine the connection 
between the x^ technique and the D/ctd for proportions method 
of testing tKeTigaificance of the difference between 2 groups, the 
mdividuals of which have been classified as either passing or 
failing, saymg either yes or no, possessing or not possessing a 
characteristic, etc. All such comparisons begin with a fourfold 
frequency table of the type symbohzed in Table 27, or an equivalent 
(the frequencies may have been recorded for only 1 category of 
the dichotomy, say the yeses, from which the frequencies for the 



Fourfold Contingency Tables 


227 


other category may be readily inferred by subtraction) Table 30 
contains the basic table of frequencies for the presence (+) or 
absence (— ) of a characteristic for groups 1 and 2, and the basic 

Table 30 Schema for Comparing Groups via x® and via Dipfebencr 
BETWEEN Proportions (or Percentages) 

Prequencies 

+ ^ 

1 A B 

Group 

2 C D 

A+C B+D N 

Proportions 
4- - 

1 pi^A/Ni qi^B/Ni 

Group — 

2 P2 = cm ga « Dm 

p = (A+C)/N q = (B+D)/N p+q^l.O 

table of proportions obtained by dividing the frequencies by the 
proper N's is indicated Note that the p and q values on the 
bottom Tuflpgiu are the proportions to use in formula (21) for the 
standard error of the difference between pi and P 2 Note also that 
Pj = A/Ni = A/{A + B) and that P 2 = C/N 2 = C/(C + D). 

In order to avoid carrying along a square root sign or radical, 
and for another reason which if not now obvious will soon become 
so, let us write the square of the expression for the critical ratio of 
the difference between the two proportions, pi and p 2 > thus, 

■P^ _ (pi — P2)^ 

N 1 N 2 

When we replace all the proportions by their equivalents involving 
frequencies and the proper N’s and also substitute frequencies for 
Ni and iV’ 2 , we have 


Pi + gi “ 1.0 
P2 H" ff2 = 1.0 


A + B * 

C + D^Ni 




228 


Frequency Comparison: Chi Square 

^ [A/{A + B)-C/{C + U)f 

~ KA + C)/N] [(^ + D)/N] [(A + C)/N]-l(.B + D)/N] 

A+B C+D 

(AC + AD-AC- BCf 

[(A + B)(C + D)]^ 

(A + 0(B + D)(C + D) + (A + 0(B + D)(A + B) 
N^iA + B)(C + D) 

(AD - BCfN^ 

~ {(A + B)(C + D)[(A + C)(B + D)(C + D)1 
I +(A + 0(B + D)(A + £)] I 

(AD - BO^N^ 

~ (A + B)(C + I>)(A + C)(B + D)(A +5 + 0 + 2)) 

D* (AD - BCfN 

7^~ (A+ B)(C + D)(A +'C)(B + D) 

r ^ 

which equate given by formula (86) for the fourfold table 
This coioSims a fact already mentioned, that for 1 degree of 
freedom is the same as the square of the cnticaJ ratio Since 
formula (21) is applicable only for comparing proportions based 
on independent samples, it follows that x^ is similarly restricted. 
That is, x^ as computed from a fourfold table by (86) does not allow 
for any correlational factor which might be mtroduced because the 
2 groups consist of paired or matched individuals or for the cor- 
relational factor which would be present if and j >2 (or the cor- 
respondmg frequencies) weie based on the same mdividuals as in a 
pretest, mtervenmg expenence, posttest situation. 

Significance of changes. The student should carefully note 
that although the application of x^ to fourfold tables of frequencies 
like that of Table 6 in Chapter 5, which is here reproduced with 
minor changes as Table 31, provides a means of testmg the signifi- 
cant of .^e assojeif^tion or correlation between 2 sets of re- 
_Bppnses, such an application does not test the significance of change 
from the first to the second set of responses. This latter test can 



Significance of Changes 


229 


Table SI Foukpold Table op Frequencies and Proportions for a First 
Set vs a Second Set op Responses prom the Same Individuals 


Frequencies 

2nd 


Proportions 

2nd 


+ 

A +R 
C+D 
A+C B+D N 


A 

B 

C 

D 



VI 


32 V2, 1*0 


be made by means of formula (20) of Chapter 5, p 59 It is also 
possible to test the significance of any found change by the use of 
To do this, we first note that a net change for the group must 
necessarily involve the difference between the frequencies, A and D, 
since the B and C cases represent those who showed no change 
The null hypothesis would be that the universe frequencies are not 
different, ie, for a given sample, A and D would differ only 
as a result of chance samplmg Smce A + D represents the total 
number of mdividuals who changed (the A^s from + to — , and 
the D^s from — to +), in setting up the null hypothesis concerning 
the net change it would seem appropriate to say that, if A -h -D 
individuals changed, (A + D)/2 would change in one direction 
and (A + D)/2 in the other direction. Thus (A + D)/2 would 
become the expected frequency, then A — (A + D)/2 and 
jD — ( A + D)/2 would become the discrepancies between ob- 
served and expected (on the basis of the null hypothesis) fre- 
quencies If A = D, both discrepancies would become zero. 
Squanng each discrepancy and dividing by E and then summing 
the 2 quotients or doubling either one will give a x^ which is 
based on 1 degree of freedom (why 1 degree of freedom?) A little 
algebraic manipulation shows that 


(A - Df 
A + D 


m 


for the particular situation in which we widi to tert the signincance 
of over-all changes 

Comparison of formula (§7) with formula (19a), p. 58, shows 
that, we again have a ^th 1 degree of freedom, which equals 




230 


Frequency Comparison: Chi Square 

the square of an a;/cr, or critical ratio The reasoning back of the 
statement given on p. 60 that formulas (19a), (196), and (20) are 
ipapplicable unless A+D equals 10 or more should now be clearer 
to the reader, li A + D were less than 10, the two E's would be 
less than 5, an acceptable though none too conservative lower 
limit for E A correction (for contmuity) needed when the £'s are 
smaller than 10 will be given shortly. One thmg which may puzzle 
the reader at this time is the fact that formula (87) does not con- 
tam a total N Its algebraic equivalent, (D/o-jd)^, with o-d calcu- 
lated by formula (20), does contam N, so the absence of N from 
(87) IS more apparent than real 

T he a d vantage of the over the D/an technique for testing the 
significant of net changes m responses lies m the fact that x^ 
values for 2 or more groups which have been used in an experi- 
ment can be summed to a new x^ with n equal to the sum of the 
separate d/s, in this case n^uals the number of chi squares bemg 
summed 

Formula (87) is, of course, not restricted to situations involvmg 
changes in responses. If we have the same individuals givmg, 
say, yes or no responses to 2 different questions and we desire 
to test the significance of the difference between the frequencies 
(or proportions) of yeses or noes, formula (87) is applicable. Or^ 
suppose we wish to know whether there is a significant difference 
m the difficulty of 2 test items which have been admmistered to 
the same group. For example, in Table 28 we have 49 and 68 
mdividuals passmg items 1 and 2 respectively Since N = 100, 
the j^bportions are .49 and 68 (or 49 and 68 per cent). By for- 
mula (87) we have = (29 — 10)V(29 + 10) = 9 26, which 
for 1 degree of freedom falls between the .01 and 001 levels of 
significance, hence it would be concluded that the 2 items are 
different m difficulty. If we use for mula (20), we g et a critical 
ratio, (p 2 - Pi)/<rD = ( 68 - .49)/V(10 4- 29)/100 = .19/ 0624 
— 3.04, which leads to the same probability figure as that for a 
X^ of 9 26. Either method may be used. Both make due allow- 
ance for the correlation which is present because the“ frequencies 
or proportions being compared are based on the same individuals. 

Corre clijon for continuity. We have already pointed out 
tEaS^ance the samplmg distnbution of x^ is contmuous, the use 
jofj^ whim any one E iSl less than 5 is questionable. For fourfold 
contingency tables, an allowance for discontinuity can be made by 



One-tailed vs. Two-tailed Test 


231 


applying Yates’s correction for continuity, which should always 
be used when any one E in such a table is less than 5 and is advisa- 
ble when an jB is less than 10. A small E is most likely to occar 
either when the total N is small or when one or both of the mar- 
ginal totals involve extieme dichotomies. It is easy to determine 
the smallest E by dividing the product of the 2 smaller marginal 
frequencies by the total N Yates’s correction can be mcorporated 
in formula (86), which becomes 


x" = 


N(}Al>-BC\-N/2f 
(A + B){C + D)(A + CXB + D) 


(86a) 


and indicates that the absolute difference between AD and BC is 
to be reduced by N/2 Formula (87) can also be written to include 
a correction for continuity The corrected form 




(U -Dl-l)'' 

A+D 


(87a) 


mvolves decreasing the absolute value of the difference between 
A and D by 1. Formula (87a) is to be preferred to (87) when 
A than 20. The reasoning back of Yates’s correction 

IS precisely the same as that given on p 48 of Chapter 5. 

One-tailed vs. two-tailed test. It will be recalled from our 
discussion of the sampling distnbution of ^ that the P’s obtain- 
able from Table D are the probabilities of the chance occurrence 
of as large a as that observed, that is, levels of significance such 
as P = 05 or .01 or 001 are based on one (the nght-hand) tail 
of the samplmg distribution of x^ Doss this mean that it is a 
one-tailed test in the hypothesis testmg sense discussed earlier 
(pp. 62-64)? Let us recall a couple of facts. First, when using 
the binomial to indicate something of the nature of the x^ distri- 
bution we saw that both tails of the binomial were combmed as 
1 tail of the x^ distributions Second, for 1 degree of freedom 
X^ = Now an x/a of 1.96 corresponds to the P = 05 

level as a two-tailed test. The square of 1.96 gives a x^ of 3.84, 
which we can see from Table D also corresponds to the 05 level. 
Hence the P’s, for 1 degree of freedom, read from Table D are 
equivalent to th^e based on the two-tailed test de^ite the fact 
that onlyTtail of the x^ distnbution is mvolved 



232 Frequency Comparison: Chi Square 

If the decision to be made or the hypothesis to be tested calls 
for a oQfisfcailed test, the P's from TaM e^D n eed to be j ^ved : a 
X? of 5 41 (instead of 6.64) is required for the 01 level, ^d a x^ 
of 2.71 (instead of 3 84) gives the 05 level Incidentally, for 
1 degree of freedom, a x^ P can, obviously, be obtamed by entenng 
its square root into the normal curve table — ^whether such a P 
from x/(T IS based on one or both tails of the normal distribution 
depends on the hjrpothesis bemg tested As we proceed, the 
student should convmce himself that the notion of direction of 
differences, hence the idea of a one-tailed test, doesn't make sense 
in other applications of x^ 

Comparison of two or more correlated proportions. For- 
mula (87) has recently J been extended to provide a method for 
testing whether 3 or more nomndependent proportions (or sets 
of frequencies) differ significantly among themselves For exam- 
ple, we may have pass-fail (or yes-no, or some other dichotomous) 
information on C items (or questions) for N individuals; or we 
may have only 1 item with responses from iV persons under Q dif- 
ferent conditions; or 1 item with responses from N sets of C 
matched persons each, that is, C matched groups. 

Data from such situations can be arranged in a table consisting 
of iV rows and C columns. The total number of passes (yeses) in 
a given column divided by iV will, of course, be the proportion of 
passes (or yeses) in that column Do these C proportions (or the 
totals) dijBFer significantly in an over-all sense? The null hypoth- 
esis is that all the proportions are the same except for chance To 
test the null h 3 q)othesis we will need to obtam not only the column 
totals (number of passes) but also a similar total for each of the 
N rows Let T stand for the total in any column and X stand for 
the total in any row This X is a sort of “score" for the person — 
his number of passes (or yeses) on the C items Cochran shows 
that the sampling distnbution of the quantity 


(C - l)[C2r2 - (Sr)^] 
CSX - SX^ 


( 88 ) 


follows the x^ distribution with C— 1 degrees of freedom for 2V' 
large (2V' > 30, presumably) 

t Cochran, W G, The companson of percentages in matched samples, 
Btometnka, 1950, 37, 256-266 



233 


Chi Square for 2hy k Tables 

The computation of Q is so easy that it need not be illustrated. 
If an obtained Q exceeds the required for a chosen level of 
significance, one concludes that the (correlated) proportions do 
differ in an over-aU sense, that is, they are not homogeneous It 
can be argued that unless Q is significant one is not justified in 
singling out the proportions (or columns) which give large differ- 
ences for the purpose of testmg the significance of the difference 
since such selection tends to capitalize on chance differences 
Chi square for 2hy k tables. The calculation of from a 
table with 2 rows and k columns (or 2 columns and k rows) can be 
accomplished by way of expected cell frequencies calculated as 
previously suggested from the marginal totals or by means of 

El. 

AtBt L Ai + jB* At + B 

in which the A’s and B’s have the meanings mdicated in Table 32, 
Table 32 The Calculation op x® from a 2 by A; Table 2 Groups and 




k(~ 

5) Responses 



CoL A 


CJd B 

CoLC 

Col D 

Col E 

Group 









Bx 

B®, 

1 


II 

A% +■ Bx 







Ax + Bx 

Ax + Bx 

1 27(=»ii|) 


16(= Bd 

42 

3571 

5.36 

2 26(~°^a) 


16(« Bd 

42 

.3810 

6.10 

3 247(- At) 


110(= Bd 

357 

.3081 

33 89 

4 41(-il<0 


8(-B4) 

49 

.1633 

1.31 

5 39(-As) 


15(* Bd 

54 

.2778 

4.17 






50.83 

Totals 380(- Ad 

+ 

164(- BO 

« 544(«iV) 

.3015 

49 44 



lA 



1.39 

^ 1- 
(380)(164) 

, x*-(47®(139) 

-660 



n«4, 16 



wherein will be found the frequencies for 2 groups classified accord- 
ing to 5 response categories The necessary computations required 
by formula (89) are also included in the table Note that, as 
usual, the marginal totals are first foimd by summing across 
and down. Column D is obtained by dividing the entries in 



234 Frequency Comparison: Chi Square 

column B by the adjacent values in column C, and column E 
results from multiplying the D column values by the B column 
figures These same operations, when applied to the last (or 
totals) line, lead to the column E entry of 49.44, which is the value 
of the + Bt) term in formula (89) Summmg the first 5 

figures in column E yields 50 83, or the 2 term of (89), and the 
difference between 50.83 and 49 44 is 1 39, the value of the 
bracketed part of the formula When this is multiplied by 
N^/AtBt, we have which for a df of 4 yields a P of about 16. 
In other words, once in 6 trials differences as large as those in 
Table 32 would occur by chance, hence we have insuflSlcient evi- 
dence for concludmg that the universes from which these 2 sam- 
ples were drawn differ in regard to their responses to the asked 
question 

If one had to depend upon the P/o-d technique for testing the 
significance of the group differences in Table 32, 5 critical ratios 
would result — ^for each category there is a possible difference m 
proportions or percentages with a standard error for each differ- 
ence. The 5 CR^s might, and usually would, lead to 5 different P 
values with a consequent predicament as to interpretation. Off- 
hand, it might be argued that, if any CR or P so determined 
reached an acceptable level of significance, one would be justified 
in concluding that the difference between the groups was real 
rather than chance. That such an argument may be fallacious is 
well illustrated by the data of Table 32, which are actual data. 
When these data first came to the author's attention, the table 
was in percentage form with a CR worked out only for the category 
diowing the largest difference This CR, based on formula (21), 
was 2.54, which is near the P = 01 level of significance, and it 
had accordingly been concluded that a real difference had been 
foimd. Now, when we consider the x^P of .16 for the over-all 
comparison, we are not justified in placmg much confidence m 
such a conclusion 

Why the apparent inconsistency between 2 tests of significance? 
Since most investigators are looking for group differences rather 
than group similanties, there is the tendency to single out a 
category for comparison not because of mtrinsic a prion interest 
in that category but because it happens to yield the largest dif- 
ference. By this a postenori selection one tends to capitalize on 
differences which maynSeTarge mainly as a result of chwce. A 



235 


Application to fe by I Tables 

similar situation occurs when we have the means for several 
groups— the largest of the possible differences may be the largest 
partly or entirely as a result of chance As will be seen in tl^e 
discussion of the analysis of vanance in Chapter 15, before any 
one difference is tested, an over-all test of significance should be 
applied If this over-all test yields a significant P, then and only 
then is one justified m proceeding to an examination of single cate- 
gones Thus the use of for such situations as are exemplified 
in Table 32 not only provides an over-all single index of signifi- 
cance but also helps us avoid false conclusions. 

Application to fe by Z tables. Consider the data of Table 33, 

Table S3 Table op Frequency of 3 Possible Responses for 3 Groups 
OP Individuals — ^Percentages in the Parentheses Add Downward 

TO 100 * 


Motivation of 
Conscientious 


Group 



Objectors 

I 

II 

III 

Total 

Not cowards 

24(27 0) 

56(53 8) 

71(69 6) 

151 

Partly cowards 

30(33 7) 

23(22 1) 

19(18.6) 

72 

Cowards 

36(39 3) 

25(24.0) 

12(11.8) 

72 

IPb 

89(100 0) 

104(99.9) 

102(100 0) 

295 


* Data from Leo Crespi, J Psychol,, 1945, 19, p 285. 

which contains a contmgency-type table mvolving 3 groups and 
3 possible opinion responses To test the significance of the dif- 
ferences between the groups by use of the CR techmque would 
involve comparing the percentages for group I vs. II, I vs III, 
and II vs. Ill, for each of the 3 responses — ^a total of 9 CJS's 
Even though there is no short-cut formula for computing x^ for 
such a table, its calculation is far quicker than the determmation 
of 9 CP’s Straightforward computation gives x^ = 36 58, which 
for d/ == 4 IS double the value of the x^ needed for the P = 001 
point. From Elderton’s table we find that P is about .000001; 
hence Table 33 as a whole exhibits highly significant differences 
between the groups 

Perhaps a better understanding of the extent of the differences 
can be had by considenhgTEe percentages given in parentheses m 
the table. Membership in group III means a greater tendency to 
the “not cowards” re^ptrase:" Group I tends more to give the 



236 Frequency Comparison: Chi Square 

“cowards'’ response Now it happens that the 3 groups, I, 
and III, can be (and are) placed in an ordered series for amount 
of education: grammar school, high school, and college respectively 
Thus the association shown m the table is in the direction of less 
disparagement of conscientious objectors by those in the higher 
educational level The stiength of association or degree of correla- 
tion IS represented by a contingency coefficient of 33, which may 
seem rather low m light of the highly significant -P. This illus- 
tiates a pomt which most readers will already have grasped: high 
statistical significance and a high degree of association are far 
from synonymous Consideration of the data of Table 33 readily 
mdicates the difficulty of predicting responses when the extent of 
association is represented by a C of 33 

As in the 2 by fc table, so here it is better to calculate an over-all 
X^ before examining by the CR techmque any of tEe possible sepa^ 
rate comparisons. Unless the x^ P is significant, it is imwise to 
proceed with such comparisons 

Goodness of fit. The use of x^ in testing the goodness of fit of 
a theoretical curve to an observed frequency distribution is illus- 
trated m Table 34 One starts with an actual distribution, usually 
with more groupmg intervals than in our example, and the descrip- 
tive statistical measures therefor In fitting the normal curve 
to the distnbution of Table 34, we need N, M, and o-. To set up 
for each interval the frequency which would hold for the best- 
fitting normal curve, we go through the tedious process of deter- 
mining the proportionate area under the theoretical curve for 
each interval. Once the proportions are known, each is multiplied 
by N to secure the expected frequencies The proportions are 
ascertained by calculating the x / <t value of the boundary limits of 
the intervals. For example, the 110-119 interval may be thought 
of as runmng from 109 5 to 119 5, since IQ's are rounded to the 
nearest integer Then (109 5 — 104 56)/16 99 = 2907 as the 
x/fT for the lower limit, and (119 5 — 104.56)/16 99 = .8793 as 
the x/g for the upper limit of the 110-119 interval Of course, 
8793 is also the lower limit for the 120-129 interval Now the 
difference, 8793 - .2907 = 5886, is the same as 10/16 99 or z/<r, 
which IS the interval width expressed m x/g units Adding 5886 
once to ,2907 gives 879 (it is sufficient to retam 3 decimals); 
adding it twice gives 1.468; and so on Then subtracting .5886 
once from .2907 gives — 298, subtracting twice gives — 886, etc. 



Goodness of Fit 237 

TdbU S4 Goodness op Fit of Normal Curve to Stanfobd-Binet IQ^s, 

Form M 


Proportionate 


IQ 

0 

1 

x/<r 

Aiea 

B 

0-E 

(0 - E)yB 

160 








150 

13/ 

16 

2 645 

.0041 

12 

4 

1.33 

140 

55 


2 057 

0158 

47 

8 

1 36 

130 

120 


1.468 

0512 

152 

-32 

6.74 

120 

330 


.879 

1186 

352 

-22 

1 38 

no 

610 


.291 

1958 

582 

28 

1.35 

100 

719 


- 298 

2316 

688 

31 

1 40 

90 

592 


-.886 

1950 

579 

13 

.29 

80 

338 


-1.475 

1177 

350 

-12 

.41 

70 

130 


-2.064 

0506 

150 

-20 

2 67 

60 

48 


-2.652 

0155 

46 

2 

09 

50 

7 

12 


0040 

12 

0 

00 

40 

4 







30 

1. 








2970 



9999 

2970 

0 

17 02 = X* 



M « 

104 56 

1 

1 

3 = 8; 

P = .03 



<r « 1699 

When the boundary limits in terms of x/a have been set up, the 
proportionate area for a given interval is found by using the table 
of normal curve areas. The 2 top mtervals have been combined, 
and likewise the 3 bottom intervals, so as to have no expected 
frequencies less than 10 The proportionate areas, 0041 and .0040, 
represent the areas beyond given pomts, and the JS's at top and 
bottom are the number of cases expected beyond these same 
points. Note that the sum of the proportions should be unity 
within limits of roimding errors, and that the sum of the expected 
frequencies should be the same as the sum of the observed fre- 
quencies. Perhaps it is unnecessary to point out that the expected 



238 Frequency Comparison: Chi Square 

frequencies form an exactly (within limits of roundmg errors and 
for the given intervals) normal distribution which will yield the 
same M and <r as the observed distribution with which we started. 

Straightforwaid calculation gives a of 17.02 With df — 
11 — 3 (number of intervals mmus the number of constants used 
in the fitting), P = .03, ie., only 3 times in 100 would as large 
a x^ anse by chance, or only 3 times in 100 would we get a worse 
fit if the universe of IQ’s were distributed as a normal curve. 
This would lead one to question whether IQ’s, as measured by 
Form M of the If 37 Revision of the Stanford-Bmet, are distrib- 
uted m the normal curve fashion The same data with intervals 
of size 5 give a x^ P of 003, and the degree of kurtosis (by moments) 
is thrice its standard error, therefore one can conclude that the 
observed distribution is not a chance departure from a normal 
distnbution. 

Thus the x^ technique provides us with a test by means of which 
we can judge that the frequencies of a given distribution do not 
follow the frequencies of a theoretical curve closely enough to be 
regarded as chance departures therefrom Note that a smaller 
value for x^ for the example of Table 34 would not prove that the 
universe is normal even though the P were as large as .90 or 95. 
This would merely indicate that the given data were consistent 
with the normal distribution As a matter of fact, so-called 
excellent fits leading to P’s of .99 or more are suspect. When 
P = 01, it is said that chance sampling would lead to a worse 
fit only once m 100 times; when P = .99, it is said that chance 
sampling would lead to a better fit only once in 100 times. In 
other words, if P is between .05 and 01, the hypothesis that the 
universe distnbution is of the normal type (or whatever type was 
fitted) is questionable; if P is .01 or less, this hypothesis is rejected; 
if P is between 95 and 99, one may suspect the fit as being too 
good, if P is 99 or more, one should definitely look for an error 
in calculation or for some type of restraint on the operation of 
chance. Too good a fit is as open to question as too poor a fit. 
If P is bet^en ($5 and 95, the fit is said to be satisfactory. 

When one is testing the goodness of fit of frequency curves, the 
df d e^jends upon the number of grouping intervals and upon the 
number of restnctions imposed or the ways in which the expected 
distribution is made to agree with the observed distribution. The 
general principle back of the detenmnation of df for x^ as a test of 



Goodness of Fit 


239 


fit may be illustrated for the case of testmg the goodness of fit of 
the normal curve. The expected and observed distributions are 
made to agree with respect to N, M, and a. Suppose that we have 
k grouping intervals and that we let stand for the frequency m 
the ^th interval and Xt for its score value (midpoint), and that 
represents the corresponding deviation score value for this mid- 
point Then the following equations will hold: 

/1+/2+/8+- •+/.+•••+/» = J\r 

+ fa^2 + fs^s H 1- ftXt H 1- fkXk = NM 

a + s H H H f- /t*®* = JVo® 

Now, if all the / values were known except /i, / 2 , and /s, those 
parts to the right of the /a term in the first of these equations 
could be added numerically. The resultmg sum could be shifted 
to the right of the equality sign and then combined numerically 
with N, giving an equation of the type fi+f 2 +fz = A, where 
A equals N minus the sum of all the frequencies save the first 3. 
Likewise, the parts beyond the fz term m each of the other 2 equa- 
tions could be summed numerically, shifted to the right, and 
combined numerically with the constant, NM for the second and 
Na^ for the third equation 

This procedure will lead to 3 simultaneous equations with/i, fz, 
and fz as the unknowns 

/i + /2 + /s = -A 

fi^i + / 2^2 + B (say) 

+/ 2 a ^2 + /3ic®3 = C (say) 

It is a well-known principle of algebra that 3 equations m 3 un- 
knowns will be satisfied (if solvable) by just 1 set of values for the 
unknowns. For our particular problem, this means that, as soon 
as the frequencies for all but 3 (any 3) intervals are known, these 
3 remaining frequencies are not ‘^free to vary”; they are fixed 
because of the requirements that the frequencies or functions 
thereof must add to N, NM, and Na^. We accordingly lose 3 
degrees of freedom, and therefore when we are testmg the fit of a 
normal curve to a distribution with k intervals, the df is fc — 3. 

If we wished to ascertam whether the observed distribution of 
Table 34 could be thought of as a chance departure from a normal 



240 


Frequency Comparison: Chi Square 

curve with mean equal to 100, the expected frequencies would be 
so set up as to yield the observed <7 and N, but with M = 100. 
The df would therefore be 11 — 2, since the distributions are 
forced to agree only as to 2 constants, N and o-; hence 2 degrees of 
freedom are lost. 

Chi square can be used to test the significance of the difference 
between 2 observed frequency distributions, but this simply be- 
comes a 2 by fc table with expected values computed from the 
marginal totals as previously mdicated In such a situation, it is 
incorrect to treat either set of frequencies as those expected, agamst 
which the other is compared as a set of observed values. Such a 
procedure does not allow for the fact that both sets of frequencies 
are subject to samplmg fluctuations If one set of frequencies is 
for the umverse, and the second set is based on a sample from the 
umverse, then the universe frequencies (or proportions) can be 
used to set up expected frequencies, against which the sample 
values may be checked in order to test whether the sample repre- 
sents the universe within the limits of chance sampling error The 
df becomes fc — 1, smce this requires only that Ne — Nq. 

In this chapter we have discussed the essential nature of and 
have pointed out typical applications. By now the student should 
appreciate the advantages of over percentage compansons and 
have some insight into the use of x^ as a means of testing hypoth- 
eses 

EXACT OR DIRECT PROBABILITIES 

The x^ jP’s obtainable from Table D are approximations m that 
areas under a continuous curve are taken as estimates of values 
which form a point distribution Even with Yates's correction 
for continuity, the approximation is none too good when E values 
are less than 5 This raises the question as to the criterion for 
judging the closeness of such approximations, and the answer 
is that for situations mvolvmg 1 degree of freedom it is possible 
to specify exact probabilities How? 

First, consider the problem of decidmg on the basis of a specified 
number of successes whether a^h^ can distmguish between 2 
cigarette brands We learned in Chapter 6 that the exact P for 
the probabihty of as many correct identifications can be obtained 
by the binomial distribution, hence we need not use the normal 
curve or the x^ approximation. But such approximations are 



Exact or Direct Probabilities 


241 


not only very convenient computationally for N (or n) large, but 
also are accurate enough. In checkmg a P agamst an exact P 
derived from the bmomial one must bear in mmd the possibility 
of confusing one- and two-tailed tests, both methods should be 
alike in this regard 

Second, consider the test of the sigmficance of change (or 
diJBFerence between 2 correlated frequencies or proportions) given 
by formula (87). An exact P can be obtamed for this situation 
by resort to the binomial (see p 58) Again, in calculating the 
bmomial P, one must give consideration to whether he had in- 
tended a one-tailed or a two-tailed test 
Third, consider the fourfold table for which formula (86) is 
appropnate in testing either the significance of association or the 
sigmficance of the difference between 2 groups For this situation 
the binomial is not applicable (except when the frequencies are 
equal on one, or both, of the margins). Exact P's can be obtamed 
for such tables by a rather tedious procedure which we shall now 
describe. It can be shown that the probability for a particular 
observed set of frequencies, A, B, C, and D, for fixed margins is 

(A + B)!(C + D)t(A-hC)!(B + D)! 

To have a test comparable to the usual significance test we would 
also need the P's for all sets of frequencies deviating farther than 
the observed set from the null values of no association This 
can be made clearer by ajgi example In Table 35 will be found 

Table 35 Series of Fourfold Table Frequencies Required 
FOR Calculating P Directly and Exactly 

I II III 

- + - + - H- 



9 11 20 9 11 20 9 11 20 


an observed set (part I) and sets diowing hi^er association (parts 
II and III). Note that each part is denved from the preceding 




242 


Frequency Comparison: Chi Square 


part by subtracting 1 from both A and D and adding 1 to both 
B and C This process is continued until -4 or 2) or both become 
^ero Note that the marginal frequencies remain the same. 

Application of the foregomg formula to each table in turn will 
yield the probability for each set of frequencies, and the sum of 
these P's will be the probabihty of as great association (in the 
given direction) as that indicated by the starting (observed) set 
of frequencies We have 


Pi 


Pn 


Pin 


(120(80(110(91) 
(200(30(90(60(20 ’ 
(120(80(110(90 
(200(20(100(70(1!) 

(120(80(110(90 

(200(10(110(80(00 


The sum of these separate probabilities gives P == .0399, or 04 
(to 2 decimals) as the probabihty of obtaming sets as extreme 
(in 1 direction) as the set observed m part I of Table 35 If the 
situation calls for a two-tailed test, P = 0798, or .08, as the 
probability of as large a difference (or as great an association) 
irrespective of direction This P value of 0798 may be compared 
with a P of 082 when x^ is computed by formula (86a) and 
with a x^ P of .055 when formula (86) is used. As expected, cor- 
rection for continuity improves appreciably the estimate of P. 

The computation of the separate P's, labonous even with an 
ordinary table of logarithms, is greatly facilitated by a table of 
the loganthms of factonals, such as Table XLIX of Part I of 
Pearson's Tables for stahstiaians and hiometndans. 



CHAPTER 14 


Comparison of Variabilities 


For samples with N^s greater than 100, the standard error of a 
standard deviation, = a j •\/2A^, can be used to test hypotheses 
with regard to population standard deviations and also can be 
used m formulas (27a) and (276), p. 88, to deteimine the signifi- 
cance of the difference between standard deviations We have 
already pointed out the fact that the sampling distnbution of a 
(and of the unbiased estimate, s) is skewed when U is small; hence 
we need methods for testmg hypotheses about variabilities which 
make allowance for the skewness of the sample <r's or s’s. The 
student will recall that^^ Sa;^/(JV — 1), but he may need to 
refer back to p, 107 for computational procedures From now 
on we shall use the symbol jgf in place of 

It can be shown that [exactly equivalent to (iV — l)s^/^] 

will, for successive random samples, be distributed as with 
N — 1 degrees of freedom. This fact permits exact tests of hypotii- 
eses regardmg a single variance an,d alsp provides a method of 
setting confidence limits for a population variance, but since there 
seems to be little if any need for such statistical activity in psy- 
chology, we shall not elaborate further here. The enterprismg 
student will be able to set up the procedures 
When testing the difference between 2 standard deviations or 
2 vanances we must, as always, distmguish between situations 
involving correlated values situations m which the measures 
arejugyifijasndent (or based on independent s^gles).i The methods 
about to be presented are applicable for both small and large 
samples and are based on differences between variances rather 
than differences between standard deviations. 

Differences between correlated variances. Correlated 
vanabilities arise when we have 2 forms of a p^chologioal test 



244 Comparison of Variabilities 


administered to the same group with a <r or s for each form, or^ 
when we have the c for a first trial vs the <r for a later tnal 
jfpr the same sample,' or <r^s for the performance of 1 group under 
different experimental conditions, or o-’s based on 2 groups {N 
pairs of mdividuals) related by blood or related by matching 
For such situations the diffeience between variations can be tested 
by 




or its exact equivalent with s^i and s ^2 replaced by a^i and (7^2* 
This t follows the t distribution wth N — 2 degrees of freedom 
Differences between independent variances. For the pur- 
pose of testing the difference between uncorrelated cr's or s's, 
Professor R A Fisher developed the mathematics of the sampling 
distribution of a fimction designated by z and defined as 


Z = loge Si - loge $2 (91) 

If successive samples are dravTi from a single universe or from 2 
umverses having the same variance, the sampling variation of z 
will center at zero and depend upon ni and 712 , the two dfs. Note 
that the sampling distribution is independent of the universe value 
of the variance or standard deviation In other words, we do not 
require an estimate of a standard error which uses information 
from the samples, as reqmred for the standard error of the differ- 
ence between cr’s Probability tables for the z function are avail- 
able by which one can, for given df% i e , ni and 713 , find how large 
z must be for the 05, the .01, and the .001 levels of significance. 

The z, defined by formula (91), has 1 disadvantage logarithms 
must be used Since (91) can be written m the equivalent form 

1 A 

2 8^2 
s 

it is seen that^ instead of ViUJLXCX ILT^UW een 2 logarithms, we 
have as a function of the ratio of the 2 estimated variances. From 
the sam]plmg distribution of one-half the log of a ratio, the sam- 
pling distribution of the ratio itself can be inferred For ni = 6 
and 712 = 16, the value of z, which will be exceeded 1 per cent of 



Differences between Independent Variances 245 

the time by chance (the 01 probability level), is 7450 This is 
one-half the log of the ratio of the 2 variances, and hence the log 
of the ratio would be 1 4900, by reference to a table of natural^ 
logarithms the antilog of 1 4900 is found to be 4.44. That is, as 
large a ratio as 4 44 would occur 01 time by chance. In order to 
avoid the necessity of using logs, Professor George W. Snedecor 
has developed tables for the variance ration which is defined as 



(93) 


The equation * of the sampling distribution of F contains 2 
n’s. ni for the df upon which Si is based, and W 2 as the df for S 2 
This means that there is a samplmg distribution curve of F for 
each possible combination of Ui and The probability table 
for F must accordingly be entered with ni and in order to learn 
what level of sigmficance a given F reaches To use Table F of 
the Appendix, we take the larger of the 2 vanance estimates 
as the numerato r in computing F, and the df for this larger esti- 
mate IS sjmSolized as regardless of any system of subscripts 
that may have been used to designate the 2 groups Thus the 
F that IS used with the table is always unity or greater, even though 
the sampling distnbution of F involves values less than unity. 
That IS, if we were drawmg successive samples from groups A 
and B and each time took F as s^a/s^b regardless of which was the 
larger estimate, the samplmg distribution of F would obviously 
involve values below unity as well as above unity The table, 
however, is set up in terms of the greater-than-unity side of the 
sampling distribution 

If one wishes to judge whether 2 samples, either large or small, 
yield a difference m variability which is large enough to warrant 
concluding that the 2 population variabilities differ, he sets up 
the null hypothesis that no difference exists in the 2 population 
variances Thfili, instead of dealmg as usual with the difference 
between the 2 estimates, he tak es their ratio Obviously, the 

(«lf 




246 


Comparisoxi of Variabilities 

departure of this ratio or F from unity reflects or depends upon 
the difference between the 2 variance estimates. If the value of 
F, computed with the larger estimate m the numerator, is so large 
that it IS not reasonable to believe it a chance deviation from a 
true value of umty, the null hypothesis is rejected, and it is con- 
cluded that the 2 populations do not have the same variance If 
F is small, i.e , near unity, the null hypothesis is accepted 
Now it happens that, although the F values given in Table F 
for the .05, the 01, and the 001 levels of significance hold for the 
major and very extensive uses of the F table to be discussed in 
Chapters 15, 16, and 17, these values are not applicable to the 
simple case where we wish to ascertam the probability of as great 
a difference (irrespective of direction, i e , a hypothesis or decision 
requirmg a two-tailed test) between the variances for 2 groups. 
For this particular case, an F which falls at, say, the .01 level 
signifies that as large a difference in one direction would occur 
1 per cent of the time by chance This is so because in placing the 
larger estimate in the numerator we are considermg only tail 
of the F distribution. In asking whether 2 variance estimates 
of, say, 10 and 25 based on 2 groups differ, i e., lead to an F which 
departs significantly from unity (no difference), we should con- 
sider not only the probability of securing an F as large as 25/10 
but also the probability of obtaining one as small as 10/25 This, 
it wiU be observed, is exactly analogous to considering both posi- 
tive and negative values for the z of formula (91) and then raising 
the question as to the probability of obtaining on a chance basis 
as large a difference, irrespective of direction If we had this last 
probability, we would halve it to obtain the P for 1 direction 
only; conversely, if we had an F which fell at the P = .01 level 
in the table, we would need t o doub l e 01 to sepy^^^Jjh a^irQhabih ty 
for y large a dSference urespe ct ive of directio n. \ In other wor^; 
for fills particular case, that of test^ the^ sigiificance between 
;ttiejfanability for 2 gjroups, an F at the'Ol'pomt oi the table 
means significance at the 02 level, van F j.t the .05 level means 
significance at the 10 level; and an F at the 001 l^vel indicates 
significance at the 002 level We will riot have to make this tjrpe 
of adjustinent when we come to the principal uses of F in connec- 
tion with the analysis of variance 
For example, suppose tha t 50 21 and W7.62 are variance esti- 
mates available for 2 samples of 8 and 9 cases respectively. 



Differences between Several Independent Variances 247 

The respective dfs would be 7 and 8 In computing F we have 
147 62/50 21 = 2 94, and Hi becomes 8, with 72-2 = 7 Turning 
to Table F, we see that F would need to be 3 73 foi the 05 leveU 
which for this type of problem is the 10 level Therefore the null 
hypothesis is not rejected. If we take the square roots of the 2 
variance estimates, we get s’s of 7 09 and 12 15 By the F test, 
we are in effect saymg that the difference between these 2 s's 
is not significant. As usual, this does not prove the null hypoth- 
esis — ^it becomes acceptable because we cannot with sufficient 
certamty reject it 

If the research hypothesis bemg tested or the decision to be 
made calls for a one-tailed test, the F values in Table F are appli- 
cable without further ado As a matter of fact, if the null hypoth- 
esis is to be accepted unless is sigmficantly larger than 
one would not bother to compute F if s^a turned out to be smaller 
than 

Differences between several independent variances. We 
have seen in the last chapter that can be used to provide an 
over-all test of the difference between several independent pro- 
portions (p 235) for C groups and also between C correlated pro- 
portions (p 232). In the next chapter we shall see how an over-all 
test can be made for the differences between several means, either 
correlated or independent We shall consider now an over-all test 
of the difference between 3 or more vanance estimates This test 
is not applicable when the variances are correlated (based on the 
same group or matched groups) 

Suppose we have k variance estimates, s^i, ^ 2 , • • •, • • , ^k, 

based on mi — 1, m 2 — 1, • • •, m^ — 1, • • •, mj, — 1 degrees of 
freedom respectively Let N be the sum of the m's. Compute 
the products: each tunes its df. Sum these k products (the 
equivalent of summing the k sums of squares of deviations) Let 
stand for this sum divided by iV — Determine the log of 
each of the k ^ values, then calculate the products, each logs^ 

times the df for the given Sum these products, that is, 
% 

S(m, — 1) log s®, m which i takes on values from 1 to ft. Deter- 
mine the log of and compute 

(i — ) 

3(ft - 1) \ m, - 1 N-k/ 


1 + 



248 Comparison of Variabilities 

Finally, calculate the quantity 
2 3026 

V = — - — [(AT - h) log - S(m, - 1 ) log s®J ( 94 ) 

o 

The sampling distnbution of V follows the distnbution with 
/k — 1 degrees of freedom It V reaches the P = 05 or P = 01 
or any a priori chosen level of significance, the differences between 
the k variances may be regarded as nonchance, hence the conclu- 
sion that the k groups have not been drawn from populations 
having equal vanances If V is not significant, one accepts the 
hypothesis that the groups have been drawn from populations 
having equal variances The vanances are said to be homo- 
geneous The piocedure just desciibed is known as Bartlett's 
test for the homogeneity of variances It is appropnate for test- 
ing the assumption of homoscedasticity in bivariate correlation 
bcattergrams 



CHAPTER 15 


Analysis of Variance: Simple 


The F, or variance, ratio defined in the previous chapter is 
applicable m a wide variety of situations. The general require- 
ment IS that we have 2 independent estimates of variance, which 
estimates are, on the basis of the null hypothesis, regarded as esti- 
mates of the same population value. If F is sufficiently large, 
the null hypothesis becomes suspect, and one draws a positive 
conclusion, the nature of which depends upon the given situation. 
JEach application in this and the following chapter requires an 
assumytwn of normahtv and an a ssumyito n of homogeneit y of cer^ 
tain variances; normality of what, ana homogeneity oi which 
variances, will need to be specified for each type of situation 
It will be recalled that under certam circumstances the squared 
correlation coefficient is mterpretable in terms of the proportion of 
variance “explained.” The idea is that variation can be broken 
down mto component parts in such a way as to permit specifica- 
tion of the relative importance of the component sources Back 
of this is the fact that vanances are additive to a total vanance, 
as shown when we derived formulas (37) and (37a), which are 
basic to the so-called variance theorem. Althou^ this theorem 
is fxmdamental to the analysis of variance technique, it is not our 
aim to consider methods of estimating the proportion or percent- 
age of variance due to a given source but rather to discuss ways 
of testmg whether a possible source is contributmg to the total 
variance to a statistically significant degree. 

BREAKDO\m OF SUM OF SQUARES 

Let us begin with the simple situation in which the total varia- 
tion for a set of scores based on N individuals is possibly due in 

249 



250 


Analysis of Variance: Simple 

part to the fact that the total group is heterogeneous with respect 
to some factor, such as socioeconomic level or age or racial origin 
or type of treatment or method used m memorizing or var3ung 
level of illumination — any factor which permits breaking down 
the total group into subgroups In other words, the individuals 
or their scores can be classified into subgroups, or the total group 
can be regarded as made up of specified subgroups. For simplicity, 
let us assume that the subgroups are of the same size, say m cases 
per group, and that we have k groups. Let/ stand for any sub- 
group; i e , r takes on values of 1, ^ 3,_j • •, k^ ^d let mean 
score for the groups be specified as Xi, -Sl 2 , • , Xr, • •, X*, with 
B s the mean for all groups combined (t otal mean) Although 
’^iFYs possible to use a precise notation, such as X^rJ to denote the 
score of any, the ^th, person in group r, we shall in this chapter 
simply use X as the score for any individual 
We are now m a position to write an mdividuaPs score as a 
deviation from the total mean in terms of the deviation of his 
score from his group mean and the deviation of the group mean 
from the total mean Thus, for a score in group r, 

(X - X) = (Z - X,) + (X. - X) (95) 

which indicates 2 sources of variation: the variation of a group 
mean from the total mean and the vanation of an individual's 
score from his group mean. 

If we rewrite formula (95) specifically for group 1, we have 
(X - X) = (X - Xi) + (Xi - X) 

Squaring both sides gives 

(X - J)2 = (Z - Xi)^ + (Xi - J)2 + 2(Ji - J)(X - Ji) 

as the squared deviation, from the total mean, of any score in 
group 1 Each of the m persons m the group will have such a 
squared deviation score We may mdicate the sum of the squares 
for the m cases as 

^ ft 

2(X ~ J)2 = 2(X - + 2( - X)2 + 2( Ji - J)2(X - ^i) 

Note that in the last term the constants 2 and (^i — X) have 
been taken from under the summation sign, and that 2(X — Zi), 
bemg the sum of deviations of a set of scores about their own 
mean, will be exactly zero. Therefore, the last term vanishes. 



251 


Breakdown of Sum of Squares 

Note also that the second nght-hand term involves slimming a 
constant, which is the same as multiplying it by the number of 
cases involved m the summation, i e , — X)^ = m(Zi — 

Thus we see that we may write the sum of squares (of devia- 
tions) for the first group and by analogy Tfor"" the other groups as 
follows: 

1st group: S(X - = S(Z - + m(Zi - 

2nd group: S(Z - Z)^ = 2)(X - + m(J 2 - 

rth group: S(Z - - S(Z - -f miXr - Z)^ 

Jfcth group- S(Z - Z)2 = S(Z - Ik? + m{Xk - Z)^ 

If we summed the left-hand parts of the foregoing, we would 
obviously have the sum of squares of deviations for the entire set 
oi N — km cases This summing of s ums , or double summation, 
can be conveniently mdicated by using 2 summation signs, or 
2S(Z — Z)^. We may sum the right-hand terms separately 
The first term on the nght involves s ummin g sums, and the result 

r 

can be indicated symbolically by SS(X — Z,)®, which imphes 
that we first sum for each group, then sum over all groups The 
first summation sign mdicates that the subscript r takes m turn 
values running from 1 to A: The sum of the other li^t-hand 

r 

terms can be written as mS(Zr — If. 

Smce adding of equations leads to an equation, we have 

2S(Z - Tf = 22(Z ~ Irf + m2(Z, - X? (96) 

as a means of expressing the fact that the total sum of jgquflTftR 
(of deviations) can be broken down into 2 components, the first 
of which has to do with va nation about group means, i.e , within 
groups, \ and tha seggnd of which iuvnivftg y^^rjatinu nf,,grnnp mpana 
about the total mean, i e , between grou ps.^ In other words, the 
to tat'sum^f squ ares, is jaade up of 2 additive p^s If we divide 
both sides by N or km, we have the total vanance broken mto 
additive components, but for our present purposes we shall need 
unbiased estimates of vanance, and hence it becomes necessa ry 
to dmde throu^ by degrees of ^fpeedom 

“The co rrect df c an be ascertained by exammmg the 3 sums 
of squares For the total sum of squares to Eave~ 1’ restnction. 



252 


Analysis of Variance: Simple 

the J^J^lijaea^ and as seen in Chapter 7, p 106, the df will be 
jy — 1 or km — 1. \The wi^n-g ^ups s um is based on N or km 
squares, but since these are about A; different means there are k 
restrictions, or k m — k (= degrees of freedo m. y^ T he last 

or between-gi:pups su m invdIveT'fc means, varying more or less 
about the total mean, thus, aside from the m factor, it contains 
k squares with 1 restnction, and the d/becomes fc — 1 In other 
words, the k means are analogous to varying scores, and obviously 
the mean of these means wiU equal the total mean 
We may mdicate the division of the 3 sums of squares by the 
proper as follows: - ) 

f < ' ^ ^ ^ ^ ’ 

SS(Z - Z)2;(Z - mi(Ir - 

km — 1 * km — k ’ k — 1 

Notice that we are no longer dealing with an equation. Why? 
Each division wiU result in a variance estimate, but these are not 
directly additive, which means that we cannot specify what pro- 
portion of the estimated total variance is due to the between- 
groups variation The reader should note, however, that the 
d/’s are additive {km — 1) = {km — fc) + (/b — 1) 

Before examining the meaning of these 3 variance estimates, 
let us label them* for the estimate of total variance, for 
that based on the ^^;^^A^7^-groups sum of squared, for that Based 
upon between groups Variance estimates are sometimes referred 
to as “mean squares.” 

MEANING OF VARIANCE ESTIMATES 

In so far as one thinks of the total km cases as a sample drawn 
from 1 jpopulation, will be the best unbiased estimate of the 
variance of the popi3ation, ^.\ If we think of the m cases for 
each of our k groups as samples from k possibly different popula- 
tions, then will be a composite estimate of the several popula- 
tion variances, a sort of average which makes sense if the popula- 
tion variances are equal, if the k groups have been drawn from 
just i^population, this withm-groups vanance estimate or 
wiU differ httle from, but be somewhat smaller than, s^. Note 
that^ and?^ can not b e regarded as independent estimates be- 
cause the 2 estimates'^are based on practically the same deviar 



Meaning of Variance Estimates 


253 


tions* extreme scores, m either direction, will tend to make both 
and s^w large If m, or the number of cases per group, is taken 
larger and larger and if the groups aie regarded as belonging to 
the same population or populations diffenng in some respects bu^ 
having the same mean and variance for the given trait or variate, 
^ and wiU tend to the same value, 

r 

Let us next look at The division of mS(Zr — by its 
df may be accomplished by dividmg the sum factor by fc — 1 In 
making this division we are dividing a sum of squares by degrees 
of freedom; hence the result will be a vanance estimate. Let us 
use as a symbol for this estimate Then 




mS(J, - 

r-— - 


In order to understand the meaning of we may regard our k 
means as a sample of sample means from an indefinitely large 
supply of possible sample means for groups drawn from the qame 
population. The vanance for this universe of sample means is 
given by the standard error of mean formula, i e , = 6^ jm 

If we were given the value of and to determine the uni- 
verse trait variance or 6^, we would simply solve = a^/m 
for Thus, 6^ = If we had only an estimate of 

such as s^xbi could use this estimate as a basis for estimating 
the trait Vanance, i e , can be taken as an estimate of 6^, 
Smce ms^xb =* we have s^h and (see previous paragraph) 
as estimates of the same population variance. 

These estimates should agree withm the limits of chance, and 
bemg independent estimates of the same variance, the sampling 
distnbution of their ratio is that of the, ^ distnbution \ Whea an 
obtamed F or s^h/^w is larger than expected on the basis of chance 
sampling, the implication is that ^xb is gr eater than expected by 
chance.lS . How could this come aTSoi^ Let us suppose that our 
k groups of m cases each have been drawn from fc (Merent popula- 
tions, i.e , from populations with means which really differ Under 
this circumstance the variation of the fc sample means will spring 
from 2 sourQQS. A part of the vanance of the means will be due 
to samplii^ var^iSSTfiredictahle by the formula for the standard 
error of the mean on the basis of m and the trait vanance.%k A 



254 


Analysis of Variance: Simple 

second part of the vanation m means will be due to the variation 
of the true (population) means of the k groups. If we let 
represent the vaiiance of obtained means and the vanmce 
of the true group means, and if the several groups have the sam^ 
population variance, 6^ (this is the assumption of homogeneity 
of variances), we should expect the followmg to hold exactly for 
an infinitely large number of groups and approxmately for a 
small number of groups = 6^/m + This is analogous 
to the commonly accepted ^pression used in connection with 
test reliability; namely, that the variance of obtained scores equals 
the vanance of true scores plus error (of measurement) variance 
Multiplying the above by m, we have 
Thus, since m times the obtained vanance of group means can 
Be' broken down into 2 components, it should be obvious that 
the estimate, j, may also be subject to 2 sources of variation 
In practice we don't have a priori knowledge of whether 
is or IS not zero. What we have are 2 estimates of the population 
trait variance, that based on s^h (or and that based on 
If the s^h estimate is significantly larger than i e , if i?* or 
IS beyond the pomt for P = .01 level of significance, it 
ca n be ar gued that^^b mv olves a source of variation ove r and 
a bove o^ random samplmg OTors m the means, a nd henc e 

•^t V real I'his is, of coursed egu ivalent t o ccmclud ing 

that our m ca^ s have been drawn fro m k m^ ps with real 
ences m^theh population means ^^^^ 

^^thoi^ “the' table ofT fequire s that the lar ger of the 2 esti- 
mated^ used as th e numg^a tor in computing the variance ratio, 
it should be noted th ^ c an not be significantly larger than 
jS^b unless the o peration of chance samplmg has been restrn*.^ed» 
in some mannCT. In practical appli^ion s we are prim arily and 
nearly always mterested m the case in which gf ^ is the larger of 
the 2 e stim ates If it is s midler thfl.Ti it is or dmarily not neces - 
sary to compute F 

We^ may now summari^ the foregoing W^en we have ^cotcs 
on fe groups of m cases eac h, the toti^ sum of squares can be broken 
down 11 ^ 0^2 additive parts, that for between and that for within 
groups Dividi^ by the apprqpriate degrees of freedom, the 
withinj sum nf squares fidves, ^..^^ as an_ estimate of the trait van-, 
ance for the population, \a nd (= yields a s econd and 

independent estimate of the same popul ation vanance. / llEe 




255 


Meaning of Variance Estimates 

sampling variation of the ratio of these 2 estimates is that of the 
variance ratio, ^ if the k groups belong to the same population. 
If significantly larger than which is an estimate of the 
population variance, must be regarded as an estimate of the 
same variance 'i^lus v ariation due to real^ nonchance, differences 
between the k group s 

If we let^ stand for “is an estimate of,” then 


a". 


The null hvpothes^ is that is zero, and rejection of this hy- 
pothesis because is sig SScantly lar ge imphes that is 

not zero , or that the k groups have not been^awn from the same 
po pifiaton (or from populations with equal means). In other 
wof3s7w^have a te chnique that provides an over-all test for the 
si ^ificance of the differences bet ween seyfirfll r ^nn aidftrft d 

simultan eously, \ " j..i.rTi _ ... _ 

" i'or mi line applications discussed in this chapter, it is, 

(1) that the m cases constituting,_each group have be^ 

"Sro m a nomally d i stnbuted population of scores for the trait or 
vanable as measured ^and (2) that th e Ic p opulations have the 
same varia nce, v For large ‘©nples the first asTsumption can be 
"checked by ^y of measures of skewness and kurtosis relative to 
their standard errors by the chi square test of goodness of fit. 
Unfortunately neither of these checks is very sensitive for small 
samples. The second assumption may be evaluated, regardless 
of sample sizes, by Bartlett’s test for the homogeneity of vari- 
ances (p 248). The reader will have noted that these 2 assump- 
tions have to do with the distnbution of scores within groups, 
which lead to the denominator, of F 
There is spme evidence tha£ moderate departure from no]> 
pxaliLy and moderate lack of homogeneity of variances do not 
seriously disrupt the applicability of the technique. It is not 
easy to give a definition of “moderate,” but it is known that 
violati on of t hese ^ss rnnn tip ps leads tg too many “significant” 
F’s For example, an F which is apparently (from Table F) sig- 
nificant at the .05 level may really be sigmficant only at the .07 
level One can guard against erroneously rejectmg the null hy- 



256 Analysis of Variance: Simple 

pothesis by choosing a more stringent level for judging signifi- 
cance. 

r Computational formulas. The required arithmetical labor 
can be shortened by resort to the general principle for computing 
the Slim of squares of deviations inherent in formula (6a), p. 25: 

^ ^ (SX)2 1 

S(X - Tf = - (2^)^] 

N N 

Thus we would have 

SS(X ~ X)2 = i [iVSSX^ - (SSX)^] (97a) 

for total sum squares, in which the double summation mdicates 
that The summing is over all groups It can be shown by ea^ 
algebra that 

22(Z - Ir? = - [m22Z2 - Z(SZ)2] (976) 

m 

for within sum of squares and that 

miC^r - [A2(2X)® - (22X)*] (97c) 

- km 

for between SM^.pf squares 

Accordingly, to compute the 3 sums of squares of deviations, 
we need to sum all the raw scores, SSX,\sum the squares of all 
t^ rs^^scoreSj SSX^;iand su m t}iq squares of the separate group 
smns, S(2!X)^... These sums can re^'ily be obtained on a cal- 
culating machine by computing SX and separately for each 
group, squ aring each SX,* and then ^summing the several SX^ 
values for SSX, the SX^ values for SSX®, and the (SX)^ values 
forS(SX)2, 

EXAMPLE: TESTING THE SIGNIFICANCE OF DIFFERENCES 
BETWEEN SEVERAL MEANS 

To illustrate the application of the technique outlined above 
we shall use impublished data of Wright * on massed vs. dis- 

* Wnght, Suzanne T , Spacing of practice in verbal learning and the maluror 
turn hypothesis f Unpublished Master’s Thesis, Stanford University, California, 
1946. 



Example: Differences between Several Means 257 

tnbuted practice in the learning of nonsense syllables by the 
anticipation method The essential compa riso n is based on the 
amount of legm i n g shown in 34 minutes 1^" 5 groups of 

16 (= w) cases each. The groups differed in length of rest intef- 
vals between trials and/or in the total number of trials, as indi- 
cated at the top of Table 36. The scores of all 80 subjects are 

Table 36 Number op Syllables Correctly Anticipated at the 34th 
Minute op Practice 

Group 1 2 3 4 5 

Rest interTBl (xnmutes) 8 3 5 2 1 P* 0 

Number of trials 5 8 11 _ ..5ft. 

5 8 9 11 17 

5 7 3 12 16 

1 4 9 15 18 

5 4 10 11 11 

8 7 5 10 15 

1 7 11 8 9 

2 5 9 13 13 

2 6 6 13 13 

2 8 7 5 12 

8 14 6 7 15 

4 8 16 11 8 

1 5 12 12 13 

3 1 11 12 7 

4 5 15 9 15 

4 8 13 16 15 

2 5 4 7 13 

m 16 16 16 16 16 

SX 57+ 102+ 146 + 172 + 215 - ggX 

SX* 279 + 768+ 1,550+ 1,982 + 8.059 » sSx* 

CSJO* 8,249 + 10,404 + 21,31« + 29,584 + 46,225 - SgJQ* 

Means 356 6 38 912 1075 13.44 X 



induded in lihis table, and the necessary sums are given at ibe 
bottom of the table, separately for each group. Su mmin g across 
yields the required double sums. The group me ans are a lso given, 
although not-a etttol l v ne^e d in determinii^ F. 

The sums of squares (of deviatioiis) are obtamed by substituting 
in formulas (97) : 

SS(X - J)® = Vlrt80(7638) - (692)2] ^ 1652.20 
SS(X - r,)2 = ^1^16(7638) - 110,778] = 714.38 
mS(Y, - Z)2 = ^[5(110,778) - (692)®] = 937.82 



258 Analysis of Variance: Simple 

These sums of squares, along with the respective degrees of 
freedom and the resultmg variance estimates are conveniently 
arranged m Table 37, usually referred to as a variance table. Note 

Talle 37 Variance Table for Data op Wright 


Source 

Sum of Squares 


Vanance Estimate 

Between 

937.82 


234 46 = sS 

Within 

714.38 

75 

9 63 - 

Total 

1652.20 

79 



that the sums of squares for between and within groups add to 
the sum for the total, which provides a check on the arithmetic 
involved in substitutmg in formulas (97) This does not check 
on the accuracy of the sums given in Table 36. Note also that 
the degrees of freedom add to the total df 
The variance jratio, or becomes 234.46/9 53 or 24 60 With 

dfs of rii = 4 and n 2 = 76, we refer to the taMe of F to learn 
whether 24 60 is lar g er than expecte d on th^Ta^ of c hance 
That ^ s_ g j s h ighly s^nifican t is immediately apparent when 
we note that for the given dfs an F of about 5 2 is_ significant at 
the 001 level With the be^ween-groups vanance estimate signifi- 
cantly larger than that for within groups, we can conclude with 
high confidence that the 5 sets of scores have not been drawn 
from the same population of scores, or that^amount of time spent 
in practice is a real source of variation This is, of course, equivar 
lent to saying that the several group m e ans considered simul- 
taneously diSef significantly Son^ifien^^ 

In the illustration just given the groups can be arranged m 
order before any of the data are seen, and additional credence can 
be placed m the results because the means foflow this ordering 
It should be understood, however, that the variance technique 
does not presuppose an a prion ordering of the several groups- 
it is generall y apph cable for testing the sigmfic^ce of tfee diggr- 
ences bet ween groujg means regardless of pnor conaidexatioiJSf,-.. ^ 
If OM had av^able only the CR or t technique and wished to 
compare the means for 5 groups, it would ordinarily be necessary 















259 


Special Case of F Test When w-i = 1 

to compute t or CR for each possible difference, and 6 means would 
lead to 5 X 4/2 or 10 differences Obviously, the variance method 
requires less computation,* and furthermore it provides an ovei;^- 
alt test of significance which is not subject to the fallacy inherent 
m singlmg out the comparison involving the largest obtained t 
or CR, a practice which is likely to capitalize on chance differ- 
ences. After and only after it has been found that the ov er-all F 
is significa nt can one safely use the Lt^hnique to test the signifi- 
cance of the di fference between a ny 2 o f the grou p means When 
we do this, IS used for t he $ re quired in the fo rmula f oiLj, P 109 
Thus, to check the significance of the difference between the me^ 
for groups 1 and 2 of the Wright data, we have 

6.38 - 3 56 2 82 

t = , ' = = 2.59 

^53 9^ 109 

The variance estimate here used is based on 75 degrees of freedom; 
hence this t may be entered as a Cf2 in the normal probability 
table It IS significant at the .01 level Since group 1 differs still 
more from the remaimng 3 groups, one would not bother to com- 
pute additional (s for comparisons involving group 1 Actually 
the testing of the means for nonadjacent gioups would scarcely 
be necessary, but note that, since the groups are of the same size, 
the t between any 2 means in Table 36 will involve the same 
denominator, 1.09, already used. The use of as the for the 
t test is logical in that is based on all the available scores and 
hence is more dependable than an estimate based on just 2 groups. 

SPECIAL CASE OF F TEST WHEN Jii = 1 

If we had fc = 2 groups, the testing of the between-groups 
variance woui3** appear to be much like testing the difference 
between 2 meaois. Let us examme this case by starting with the 
expressions for the sum of squares for 2 groups. 

1st group: S(Z - = S(Z - hY + miXi - 

gnfl group: S(Z — Z)® = S(Z — X 2 )* + wi(X 2 — X)® 

Tngtjtn/I of iising double sununatiou signs, we may indicate the 
witiun-gpups sum of squares as ,S(Z — Xi)® + 2(Z — X 2 )®, 
and' tiie between-^oups sum of squares as m(Xi — X)® 



260 


Analysis of Variance: Simple 

+ m(X2 — The respective df^ will be 2 m — 2 a nd 1. Indi- 
cating the division of the sums of squares' by their d/'s, we can 
write the variance ratio as 

- If + m(^2 - 


S(X - + S(X - X2? 

2 m — 2 

Since the number of cases for the 2 groups is the same, it is readily 
seen that the mean for 1 group will be exactly as far above the 
general mean (X) as the other group mean is below X, or that 
X will bisect the distance between Xi and X2, therefore (Xi — X)^ 
= {^2^ - 4(^1 “ ^2)^* The numerator for F becomes 

(m/2)(Xi — ^2f It will be noted that the denommator term, 
which defineg^s^, is identical to the f d efined on p. 110 in con- 
nection with the t test Accordingly, we may write 


~ (^1 ~ ^2) 



Dividing both numerator and denominator by m/2, we have 


8 ^- 

m 

the square root of which is 

~ ^2 ^1 — ^2 


m 


which is identical with a formula for ^ 109, ^he n fe = 2 or 2^ 
g roups are being co mpared , ^ =_ ft can be shown that this is 
also t rue when the iV's or m's for the 2 gf gpps are unequal. In fact, 
it can be shown that, whe n^y^i = It the samp lingr distribution of 
F be comes the same as thatior ^ pr oviding the estimate based 
"orTbetween groups, 1 e , that based on 1 degree of freedom, is 



261 


Groups of Unequal Size 

used as the numerator regardless of which of the 2 estimates is 
the larger. It is thus seen that the t test is a special case of the 
F test. Note that F involves the square of the dffierence between 
means , hence it provides a basis for judging whether a difference be- 
tween means, irrespective of direction, is significant (cf. pp 246- 
247) The CR technique for comparing the means of 2 large sam- 
ples IS also a special case of the more general F test That is, when 
71 1 = 1 and 712 is not small, the sq uare root of F is CR^ mtemret- 
aOe via the normal curve table (Table A of the Appendix). 

GROUPS OF UNEQUAL SIZE 

When the number of cases vanes from group to group, we may 
let mi, m 2 , • *, m^, • , m* stand for the several JV^s The sum 
of squares for the rth group would be written as 

2)(Z - Z)2 = S(X - Xr? + mr{Xr “ Xf 
and the double summation over all groups would be 

2S(Z ~ Z)2 = 2S(X - Irf + Sm,(Zr - Tf 

which differs from formula (96) m that the varying m^s must be 
left under the summation sign m the last term. In specifying 
the degrees of freedom, we must replace km by iV, where N is the 
total cases for all groups. The respective dfs become IV — 1, 
N — k, and fc — 1 The computational formulas are changed to 

f22Z)^ 

22(X - X)2 == 22X2 - for total sum (98a) 

N 

22(X - Xr)2 = 22X2 - s for withm sum (986) 

mr 

r « o " (SX)2 (22X)2 

'2mr(Xr — X)2 = 2 for between sum (98o) 

nir N 

Note that the second term for the within sum (and the first for 
the between) requires that for each group the square of the sum 
of its scores be first divided by its m; then the several quotients 
are summed An additional row would be needed along the bot- 



262 


Analysis of Variance: Simple 

tom of Table 36 for these quotients if the m's differed, or one 
might replace the row by (SX)^/m 7 . values 

A vanance table (like Table 37) may be formed, and F taken 
to equal as before The same interpretation holds, if F 

is significantly large, i e , if is significantly larger than 
the vanation of the several group means among themselves is 
larger than expected on the basis of sampling, hence nonchance 
differences exist between the groups The student who attempts, 
for the situation of imequal m's, to reorient the logic leadmg to 
the idea that is an estimate of and that s^h is an estimate 
of ^ plus a possible will encounter some difficulty Suffice 
it to say heie that, if s^h is significantly larger than it can be 
concluded that the component involving the vanance is not 
zero. That is, when the groups have been drawn from popula- 
tions having different means, may be larger than because 
of this additional source of vanation even though it is not easy to 
regard this variation in terms of times a varying m. 

Thus the F technique may be applied as a test of the signifi- 
cance of the difference between 2 or more means based on large 
or small samples of equal or unequal size (per group) regardless 
of whether there is an a pnori basis for arranging the groups m 
order It might be said parenthetically that the scientific hypoth- 
esis being tested will specify the direction of differences if such are 
expected. 

TESTING TBDE SIGNIFICANCE OF THE CORRELATION RATIO 

If the definitions of the correlation ratio, rj (pp- 207-208), are 
reexamined, it is readily seen that for 1 variable the withm- 
arrays variance is the same as the withm-groups variance, the 
grouping being made on the basis of intervals on another variable 
Also the variance of array means is the same as between-groups 
variance. We recall, however, that the correlation ratio, as de- 
fined, does not involve the idea of variance estimates. It should 
be rather obvious that, unless the between-arrays (groups) vari- 
ance IS significantly larger than expected on the basis of samplmg 
errors in the array means, a correlation ratio cannot be deemed 
significant. 



Testing the Significance of the Correlation Ratio 265 

For purposes of exposition we shall outline the proceduie for 
testing the significance of rjyx) for which we shall use the simpler 
symbol rj The grouping will be on the basis of the intervals oj} 
the X variable, and the required sum of squares will be m terms 
of Y The sums of squares and their respective degrees of free- 
dom will be 

SS(7 - 7)2 = S2(F - 7,)2 + ^(7, - 7)2 

(iV^-1) (.N-k) (A-l) 

for k arrays with varying number, mr, of cases per array. From 
the definition formula of the correlation ratio, we have 



which becomes, in the notation of this chapter, 

SS(7-7,)2/JV 
’ 22(7 - 7)2/iV 

Since N cancels, we see that the following holds: 


S2(r - Fr)^ = (1 - - F)2 

= withm sum of squares (99) 


From the alternate expression for tj we have 


which becomes 


which leads to 






imr(?r - 7)2/iV 
22(7 - 7f/N 


inirCFr - 7)2 = 1i222(7 - 7)2 

= between sum of squares (100) 

When we widi to divide the sum of squares of formula (99) or 
(100) by the proper df, we may choose either the left- or n^t- 
part as representmg Ihe sum of squares. Thus the between^ 



264 Analysis of Variance: Simple 


arrays estimate may be written as 

1,222(7 - F)» 


= 


k -1 


and that for -within arrays as 

2 (1 - ^^)S2:(r ~ 

® " N-k 


The ratio, F = may be written as 

^^SS(F-F)V(fe-l) 

(1 - ij2)SS(F - Yf/iN - k) 
vV(k ^ 1) 

(1 - nW - A) 


It is accordingly seen that for jfixed dfs the value of F, even though 
computed from the sums rather than from their equivalents in 
terms of 17^, can be thought of as depending upon the size of 77^, 
therefore a significant F indicates a significant correlation ratio 

With the 3 sums of squares computed, we can readily deter- 
mine whether any correlation in the sense of the correlation ratio 
exists, and we also have the necessary sums for calculating 77 if it 
is desired to have this measure of the degree of correlation A 
significant F does not, however, mean a high correlation ratio, 
with N large, a low 77 can possess statistical significance. 

The computation of the sums of squares is accomplished by 
means of formulas ( 98 ) with the X's replaced by F’s 


SIGNIFICANCE OF LINEAR CORRELATION 

An appreciable correlation between 2 variables which are 
linearly related implies that the slopes of the regression lines are 
not zero, which in turn implies that the variance of predicted 
values is large enough to have some kind of statistical significance 
The variance techmque may be used as a test of the significance 
of Imear regression 

Suppose that we develop the argument m terms of the regres- 
sion of F on X. We may wnte the linear equation for predictmg 
F from X as F' = BX + A. If we think of this regression line 



Significance of Linear Correlation 


265 


as having been dravm on the scatter diagram, it can readily be 
seen that the deviation of any person’s Y value from the mean 
of the F’s can be expressed in terms of its deviation from the 
regression hne (or predicted value) plus the deviation of the pr^ 
dieted value from the mean of the Y’s. 

(F - ?) = (7 - F') + (Y' - 7) 

in which Y' will vary from person to person m accordance with 
his X score. If we square all such (F — 7) deviations and anni 
over all cases, we get 

2S(y - F)2 

= S[(7 - 70 + (7' - F)f 

= 2(7 - 70^ + S(7' - F)2 + 22(7 - 70(7' ~ F) 

for which double summation signs are not needed for clarity even 
though the summing is over all cases The last or cross-pioduct 
term has to do with a possible relationship between predicted 
values and residuals, but, as was shown m Chapter 9, this correla- 
tion is always zero, and hence this last term vanishes 
Therefore the sum of squares can be broken down into 2 com- 
ponents: residuals or within arrays about the regression line 
and a part depending on the variation of the predicted values 
about the mean. If the correlation between X and 7 were zero, 
this latter component would be zero because one would predict 
F for all cases. The departure of this sum of squares or of a van- 
ance estimate based thereon from zero might lead one to conclude 
that real correlation exists in the population being sampled if it 
were not for the fact that sampling errors ordinarily operate so 
as to prevent the obtaining of zero correlation 
Before attempting to understand the operation of chance sam- 
pling, we should consider the degrees of freedom associated with 
the sums of squares As usual, the total sum of squares is based 
on AT “ 1 degrees of freedom The df for 2(7 — 7')^ may not 
be immediately obvious, but note that, if iV = 2 and variation 
exists for both X and 7, the regression line would necessarily 
pass through the 2 points defined by the pair of scores, r would 
be imity, and 2(7 — 7')^ would be zero. In other words, with 
AT = 2, there is no freedom for deviation from the regression line. 
From this it would be mferred that N needs to be reduced by 2, 



266 Analysis of Variance; Simple 

or that df - N — 2, a deduction which is consistent with the fact 
that, in fitting a straight line, 2 constants are determined from 
l^he data, and hence 2 restrictions are imposed on the N devia- 
tions of the type (V — Y'), 

Since the dfa for the component sums of squares are additive 
to that for the total, one can detenmne the df for the regression or 
S(F' — Y)^ term by subtractmg the df for residuals from that 
for the total. {N — 1) — (JSf — 2) = \ aa the df for the regression 
term But determination of a df by subtraction does not permit 
the additive check on the correctnera of the dfa which is possible 
in case each df is ascertamed separately on the basis of some 
prmciple By what prmciple could one detenmne that for the 
regression sum of squares the proper df is 1? The value of 
Z(F' — T)^ will not be changed by diifting from gross scores to 
deviation scores, i e , by movmg the origin to the intersection of 
X and F It will be recalled that the regression equation in devia- 
tion units la i/ = bx (where 6 = J5 of the gross score form), and 
accordmgly we may write 


2(r - F)2 = S(j/' - yf = S(y' - 0)^ = 


which pemuts us to examme the source or sources of variation in 
the regression sum of squares Its value depends upon b® and 
S®®, but the value of 2®^ does not depend upon the degree of 
correlation For a fixed set of X’a, the freedom of 2(7' — 7)® 
to vary springs from b, i e., from one value, therefore the df is 1. 
A sli^tly different way of considering the question is to note 
that, since b = rioy/a^f) and 2®® = iVo®*, the sum of the squares 
of the predicted values can be written as 


2(7' - 7)® = r® (iV^t;®.) = J\rr®<7®„ 


from which it can be argued that, since the variation in predicted 
values IS a function neither of N nor of the variance of the trait 
being predicted, it is a function of one value, the degree of corre- 
lation. 

Now let us return to a brief consideration of sampling or of the 
meaning of the variance estimates which result from dividing the 



SignijBicance of Linear Correlation 


267 


sums of squares by their dfs On the basis of the null hypothesis, 
that the degree of linear correlation is zero for the population 
being san^led, the regression hne for the population would pass 
through 7, with zero slop£ or parallel to the x axis Hence 
(7 — 7') will equal (7 — 7), and the vanance of the re&iduals 
will equal the total vanance of the 7’s A sample from the popu- 
lation will seldom yield zero correlation (or zero regression), and 
therefore the residuals will tend to be somewhat reduced, or 
S(7 — 7')^ will tend to be less than 22(7 — 7)^ It can be 
shown that 2(7 — 7')^/(iV — 2) gives an imbiased estimate of 
the population variance when no correlation exists in the 
population 

That the estimate based on the regression sum of squares, 
2(7' — 7)^, divided by df = 1, is also an unbiased estimate of 
the same population variance may not seem plausible, nor is it 
easily explamed in an elementary treatment For any sam- 
ple, 2(7' — 7)^ equals the difference between 22(7 — 7)^ 
and 2(7 — 7')^, and it can be demonstrated that on the 
average Ae value of 22(7 — 7)^ — 2(7 — 7')^ will equal 
22(7 — Y)^/{N — 1), or that the mean value of 2(7' — 
for successive samples will be 22(7 — Y)^/{N — 1) Since the 
latter is an unbia^d estimate of the population vanance, it fol- 
lows that 2(7' — 7)^/1 must be an estimate of the same vanance 

Of the 3 variance estimates, only the estimates based on re- 
siduals and on regression are independent. The sampling dis- 
tnbution of their latio is that of F, Let stand for the esti- 
mate based on the residual sum of squares and stand for the 
estimate based on predictions by a Imear regression function 
Then, if s%/s^r, with ni = 1 and 7i2 = iV — 2, falls at or beyond 
the .01 level of significance, the null h3npothesis becomes suspect 
This means that the estimate is larger than expected on the 
basis of sampling, from which it may inferred that regression 
IS a real source of vaiiation m 2(7' — 7)^, i e , that the slope of 
the regression for the population is not zero, or that some correlar 
tion exists 

We have already noted that 

2(7' - F)2 = 

Smce 2(7 — 7')^ divided by N equals the error of estimate van- 



268 


Analysis of Variance: Simple 


ance, previously proved to equal yiX ~ it follows readily 
that 



S(7 - Y'f = iV(l - 

Accordingly 

, S(r - Yf Nr^a^y 

1 =1 

and 

, S(F - r)2 N(l - r^)a^y 

N-2 N-2 

Therefore 


TP — 

JVrVyi _ r-2 

r — 

iV(l - r^Vy/Qf - 2) (1 - r2)/(iV - 2) 

which is the square of the t given earlier (p 146) for testing the 
significance of r Thus, again we have F = when ni = 1 

The reader will have noted that, since the required sums of 


squares and the resulting F can readily be expressed in terms of 
r, there is no need to worry further about a computational scheme 
for securing the sums of squares. The easier thing to do is simply 
to compute r. After that is done, either the F or the t test may 
be used for judging whether the conelation is significant This 
discussion of the linear correlation problem here should help the 
student appreciate the generality of the analysis of variance 
technique and should also provide him with relevant concepts 
for understanding the test for curvilmeanty of regression, to 
which we now turn 


TESTING LINEARITY OF REGRESSION 

We have seen that the correlation ratio is a general measure of 
the degree of correlation and that r measures the degree of linear 
relationship. Even though the regression of F on X for a popula- 
tion be exactly Imear, it will be found for a sample that the means 
of the arrays will show some deviation from a straight Ime; hence, 
as previously pointed out, the correlation ratio will tend to be 
larger than r How large should the difference between rj and r be 
before one suspects nonlmearity, or how much can the an‘ay means 
deviate from a straight line by chance? Before the development 
of the analysis of variance techmque, the inadequate Blakeman 
criterion was used to answer the foregoing In presenting the 



269 


Testing Linearity of Regression 

currently accepted method, we shall carry the argument through 
on the basis of the regression of F on Z 
Imagine a scatter diagram with regression line drawn and the 
array mean located in each vertical array. For a score in the rth^ 
array, the deviation of F from F can be thought of in terms of its 
deviation from the array mean, F^, plus the deviation of the array 
mean from the predicted value, F',., plus the deviation of the pre- 
dicted value from the total mean In sjntnbols, 

(F - F) = (F - F.) + (F, - Y'r) + (Y'r - F) 

Squaring and summing for the cases m each array and then 
summing over all k arrays (equivalent to summing over all groups), 
we have 

S2(F - F)2 

= SS(F - F,)" -H imr{7r - Tr)^ + iniriTr « F)" 

the cross-product terms havmg vanished because the component 
parts are uncorrelated. 

The first component is a sum of squares based on within-array 
variation with iV — fc degrees of freedom. We encountered this 
in checking the significance of the correlation ratio, and we then 
labeled as the variance estimate based theieon 
The second sum involves deviations of array means from linear 
regression. Its df will be fc — 2 since there are k means and 2 
restnctive constants m F'r If fc = 2, the 2 means cannot vary 
from the fitted line. Let us use s^d as a symbol for the variance 
estimate based on this sum of squares. 

The third sum, which has to do with the part of the total vari- 
ance predictable by means of linear regression, is very similar to 
that occurnng a few pages earlier in connection with the F test 
of the correlation coefiicient It differs only in that the same 
value is predicted for all cases within an array regardless of tibeir 
location m the X interval defining the array. This is equivalent 
to a hnear prediction of the_mean of the array. Actually, the 
numerical value of 2(F' — F)^ as calculated by NT^a^y, which 

equals r^XZiY — F)^, will be the same as ^mr{Y'r — F)® com- 
puted directly, provided r was originally determmed from a scat- 
ter diagram with the same mtervals now bemg used to define the 
arrays. We have already seen that the df for this sum is 1, and 
we have used as a symbol for the estimate based thereon 



270 Analysis of Variance: Simple 

It will be recalled that, in the scheme for testmg the significance 
of the correlation ratio, the total sum of squares was broken down 
into a within-array and a between-array part. We now have a 
breakdown into within array (as before) plus 2 additional parts — 

r 

the sum 2wir(I^r — Y)^ is broken into 

imriTr - Y'r? + - F)® 

It will also be recalled that 

S»ir(F, - F)2 = V'SSCF - F)2 

and that 

imr(Y'r - F)® = r2SS(y - F)2 

r 

By subtraction, we see that tl^ new sum, Sm^CFr — is 

equivalent to — 7^)SS(F — Y)^. 

For convemence, we shall now assemble in an analysis of vari- 
ance table the several symbolic expressions having to do with 
testing the significance of (1) the correlation ratio, (2) the linear 
regression coeflSicient, and (3) nonlineanty of regression Table 38 


TcMe 38 Analysis op Variance Functions for Bivariate Correlation 


Source of 
Variation 

Sum of Squares Equivalent 

4f 

Esti- 

mate 

(a) Linear 
regression 

S»n,(7', -F)® 

1 

S®P 

(&) Deviation 
of means 
from Ime 

- ry* - (it® - r*)22(r - F)® 

jfc -2 


(c) Between- 
array means 

2?»,(Fr - F)* - it®22(F - F)® 

fc - 1 

«*6 

(d) Within 
arrays 

22(7 - F,)» - (1 - ^22(7 - 10® 

N-k 

«®» 

(e) Residual 
from line 

22(7 - 7V)® = (1 - r®)22(7 - F)® 

N-2 

«*, 

(/) Total 

1 

to 

m 

■ 








271 


Testing Linearity of Regression 

gives the sources of variation, the sums of squares and their 
equivalents in terms of r or rj, the degrees of freedom, and a sym- 
bol for each of the variance estimates Note, in review, that for 
the sums of squares, their equivalents, and the d/’s, the following 
additions hold true: 

(a) + (6) = (c) 

(a) + 

(c) + (d) = (/) 

(a) + (6) + (d) = (f) 

The several useful and permissible F’s, or ratios of independent 
and unbiased vanance estimates, along with the propei d/’s (ui 
and n 2 values) for entering the table of F, may be stated in sum- 
mary form: 

jPj = = fc — 1, = N — fc: significance of cor- 

relation ratio 

jPg = = 1, 712 = JV* — 2: significance of lin- 

ear correlation 

iTg = ni = fc — 2, 712 = N — fc: significance of cur- 

vilineanty 

We have already discussed the first 2 of these F’s. If we wnte 
the third in terms of sums and d/’s, we have 

„ imr(Yr - TrY/ih - 2) 

^ S5S - • ■ S= ~ 

fs(F - 7 rf/{N - k) 

_ (»j* - r2)S2(r - F)V(Jfc - 2) 

“ (1 - u2)SS(F - F)V(iV - k) 

_ r^)/(k - 2) 

(1 - r/)/(J\r - k) 

wMch indicates definitely that its value, for given dfs, is a lefleo- 
-tion of the difference between the correlation ratio and the corrdbr 
tion coeflScient. Therefore, in testing the significance of the 



272 


Analysis of Variance : Simple 

vranation of array means from Imear regression, we are testing 
the significance of the difference between tj and r If Fz falls 
beyond the 01 probabihty level, the hypothesis of linear regres- 
sion for the population being sampled is rejected When this 
happens, it follows that the correlation coejBficient and a Imear 
regression function for F on X are not appropriate measures to 
use in describing the relationship 

If one IS also interested m testing the significance of the corre- 
lation ratio for X on F and the linearity of the honzontal array 
means, the analysis is carried through with X’s substituted for 
F’s Since the number of grouping intervals on the 2 axes need 
not be the same, the value of k may differ for the 2 analyses. 

ILLUSTRATIVE PROBLEM: r, ij, AND CURVILINEARITY 

The foregoing 3 tests of sigmficance and the computations 
necessary thereto may be illustrated by the data of Table 39, 


Table S9 Bivariate Scatter for Initial and Final Scores op 92 Bots 
ON Koerth Pursuit Rotor 






X » 

= Initial Score 





X » 

Score 

Final 

Code 

0 

30 

60 

90 

120 

150 

180 

210 

h 

740 

11 




1 





1 

700 

10 


1 

2 

1 

1 


2 

2 

9 

660 

9 

1 

1 

1 

4 

3 


1 

2 

13 

620 

8 

2 

8 

2 

2 

2 


1 


17 

580 

7 

3 

3 

7 

1 

1 



1 

16 

540 

6 

2 

8 

5 






15 

500 

5 

2 

5 

3 

1 





11 

460 

4 

3 

1 







4 

420 

3 

2 








2 

380 

2 










340 

1 

3 








3 

300 

0 

1 








1 

/* - 

mr 

19 

27 

20 

10 

7 

0 

4 

5 

92 -JV 

SF 


89 

181 

139 

85 

60 

0 

37 

45 

636 

2JF* 


547 

1269 

1007 

747 

520 

0 

345 

411 

4846 

(2iy)V«r 

416 89 

1213 37 

966 05 

722 50 

514.29 

0 

842 25 

405 00 

4580 35 



Illustrative Problem: r, t], and Curvdinearity 273 

which gives the bivariate distribution for the relationship between 
initial (sum of scores on trials 1-4) and final (trials 67-70) per- 
formance on the Koerth pursuit rotor. Since it is logical to be, 
concerned with the prediction of final from mitial score, or the 
regression of Y on X, we shall be dealing with variations on the 
Y variable 

In the first place, the correlation coefficient is computed from 
the scatter diagram by the method given in Chapter 8 Its value 
of 5687 is about 01 lower than the coefficient computed from a 
scatter with twice as many intervals The use of so few intervals 
for the X vanable would obviously not be recommended for the 
computation of r, but in this illustration it is convenient because 
of page-space limitations There is the additional consideration 
that for computmg the correlation ratio one should avoid having 
too few cases per array, which if the sample is small may mean 
only a few intervals on the independent vanable At least 12 
intervals should be used for the dependent vanable In checking 
on linearity, it is necessary that we calculate r from a scatter 
with the same groupmg intervals used in computmg 77 , and no 
corrections for grouping error are needed 

For the computation of the correlation ratio and for the testmg 
of its significance, we need the withm arrays, the between arrays, 
and the total sum of squares These may be computed from 
coded scores (deviations from an arbitrary origm m terms of step 
intervals), and the entire analysis may be earned through on the 
basis of coded scores, so that cumbersomely large figures are 
avoided The reader who wishes to follow the computational 
procedure will need to note the following features of Table 39. 
The marginal frequencies on the right are for aU the Y scores, 
and the jxS along the bottom margin are the mr's, or cases per 
array For each vertical array and for the right-hand margin, 
SF and 27^ are computed m terms of coded values (these corre- 
spond to Sd and of Chapter 3). Summing across the 27 
and 27^ rows should yield the 27 and 27^ obtained from the 
marginal distribution For this problem, 227 = 636 and 227^ 
= 4846. The last row, contaming the several values of {LYf/mr, 

^ (27)^ 

is summed across for the needed 2 , which is 4580.35 m 

nir 

this example There is no check on this figure by calculations 
based on the margin 



274 


AxtalTsis of Yaxiance: Simple 

In order to get the sums of squares of deviations, the values 636, 
4846, and 458035 are substituted in formulas (98) with X re- 
placed by Y 

636® 

SS(7 - F)® = 4846 449.30 

92 

S2(r - = 4846 - 4580.35 = 265.65 

r 636^ 

imr(Tr - F)® = 4580 35 - — = 183.65 

By formula (100) we now obtain 
„ 183 65 

17 ® = = 40874; v == .639 

449.30 

which is the correlation ratio for Y on X. 

The other sums of squares called for in schematic Table 38 may 
be calculated from their equivalents in terms of r® and/or 17 ®. 
Note that r® = .5687® = .32342 

Xmr(X', - F)® = (.32342)(449.30) = 145.31 

S2(y - 7',)® = (1 - .32342) (449.30) = 303.99 

Sj»r(F, - F,)® = (.40874 - .32342)(449.30) = 38.34 

The several sums of squares and their re^ective degrees of free- 
dom are set forth in Table 40, which contains also the variance 


Table 40. Analysis of Vabiance Table fob Regbbssion of Final (F) 
ON iNITIAIi ScOBE FOB DaTA OF TaBLE 39 


Source 

Sum of Squares 


Variance Fstimate 

Linear regression 

145.31 

1 

145 31 = 

Deviation of means from line 

38 34 

5 

7 67 - 

Between-array means 

183 65 

6 

30 61 - sS 

Withm arrays 

265 65 

85 

3 13 = **„ 

Hesidual from Ime 

303 99 

90 

3.38 = s*. 

Total 

449 30 

91 













niustrative Problem: r, i], and CurvUinearity 275 

estimates obtained by dmdmg the sums of squares by their dfs. 
From these variance estimates, we have the following- 

For testing the sigmficance of the correlation ratio we have 
Fi = 30 61/3 13 = 9 8 , which for ni = 6 and 712 = 85 is highly 
significant The .001 level of significance requires an F of about 
4.0 

For testing the sigmficance of linear correlation, i e., r, we have 
F 2 = 145 31/3.38 = 43 0, which for ni = 1 and 712 = 90 is like- 
wise highly significant, the 001 level bemg at an F of about 
116. 

For testing linearity of regression, 1 e , the departure of the 
array means from a straight Ime, we have F 3 = 7 67/3.13 = 2 5 , 
which for = 5 and W 2 = 85 is near the 05 level of significance. 
Thus the apparent departure from hnearity in Table 39 is not 
sufficiently great to lead to rejection of the hypothesis of linearity; 
one would, however, question the hypothesis. This is an example 
of borderline significance which calls for drawing another sample 
or adding more cases before one sets forth a conclusion. For the 
problem at hand, a second sample of 90 boys yields a scatter dia- 
gram much like that of Table 39, so we would reject the hypoth- 
esis of linearity of regression 

The student should keep in mind that the test for linearity can 
lead to the definite conclusion that the regression is curvilinear 
(if F is large enough), whereas a low F does not prove linearity. 
Why^i^ 

If the hypothesis of linearity is disproved, it follows that the 
correlation coefficient is not a suitable figure for descnbing the 
relationship The correlation ratio can be used to describe the 
degree of association, but the form of the relationship should be 
described by a fitted curve or by a verbal description of the gen- 
eral curve tendency of the array means. Some readers will have 
noted that the correlation ratio cannot be considered very descrip- 
tive of the data of Table 39 because of heteroscedasticity. As a 
matter of fact, the lack of homoscedasticity may also mean that 
our analysis of variance test for linearity is subject to question in 
that the assumption of homogeneity of variance is violated. The 
possible extent and direction of the error due to this failure of 
the groups, as defined by intervals on the x axis, to exhibit like 
variances cannot be specified, but it is doubtful whether the error 
is serious. 



276 


Analysis of Variance: Simple 

APPLICATION TO MULTIPLE CORRELATION 

The reader may recall that the methods given in Chapter 11 for 
Judging the significance of the multiple correlation coefficient 
involved unsatisfactory approximations. In so far as we are 
mterested m testing the deviation of a multiple r from zero, the 
analysis of variance technique provides an exact test which is 
applicable when the sample is either small or large. 

Let us suppose that 7 is a dependent variable which is to be 
predicted by a multiple regression equation containing m inde- 
pendent variables designated by Z's The prediction equation 
may be written as 

7' = A + B\X\ + £ 2 X 2 H h BrnXm 

in which the B’s are the regression coefficients The deviation of 
any individual's 7 score from the mean 7 can be expressed as 
the sum of 2 parts' the deviation of his 7 from his predicted 
value plus the deviation of the predicted value from the mean of 
the 7's, thus, 

(7 - F) = (7 - 70 + (7' 7) 

If we square both sides and sum over all cases, we have 

22(7 - F)2 = 2(7 - 70^ + S(7' - F)^ 

which is exactly analogous to the breakdown used in connection 
with the test of the hnear correlation coefficient. One part has 
to do with residuals about the regression plane, the other with 
variations m the predicted values The cross-product term again 
vanishes — ^it can be shown that there is no correlation between 
residuals and predicted values. 

As previoudy, we label the 2(7 — 70^ as the residual sum of 
squares and 2(7' — 7)^ as the regression sum of squares The 
total sum of squares will, of course, have N — 1 degrees of free- 
dom The residual sum of squares will lose dfs according to the 
number of constants in the regression equation We have the 
constant A, and the number of B constants is m, hence df ^ N 
— (m -f 1) = JV — m — 1 for the residual term. The reader 
who does not immediately see the reasonableness of this diould 
consider the case of 1 dependent and 2 independent variables 
with vaiying scores on iV' = 3 cases. Imagme that the 3 scores 



277 


Application to Multiple Correlation 


for each case can be used to locate a point for each in three-dimen- 
sional space, and then think of fitting an ordmary plane to these 
3 pomts Obviously, the plane can be made to pass through all 3,^ 
hence the prediction would be perfect, and there would be no 
freedom for any of the 3 points to vary from the plane That is, 
with N = 3 (and with variation on all 3 vanables), the multiple 
derived therefrom must be unity. 

Now, as to the df for the regression or prediction sum of squares, 
we note that for a fixed set of values for the X*s the vanation of 
this term must depend upon the slopes of the regression plane or 
upon the B’s. There being m S^s, there are m ways in which this 
sum can vary, therefore df = m. This is, it will be noted, an 
extension of the argument used to explam why d/ = 1 for testmg 
the linear correlation coefficient If our df determinations are 
correct, we should have (iV' — m — 1 ) + m adding to A" — 1 , 
which IS seen to be the case. 

In Chapter 11 it was pomted out that the multiple correlation 
coefficient can be defined as 


^123 • 



in which a^i 23 represents the residual variance and 0^1 is the 
vanance for the dependent variable Smce the residual variance 
plus the predicted variance adds to the total, the multiple r can 
also be expressed as the ratio of the predicted to the total vanance. 
(Note that we are here speaking of vanances, not estimates) 
By definition, the residual variance is S(F - Y'^/N, the pre- 
dicted variance is S(7' - Y)^/N, and the total variance is 
2 )S(y — Y)^/N- We may therefore write the multiple correla- 
tion coefficient, using R in order to avoid subscripts, as 


S(F - Y')yN 
SS(F - Y)yN 


from which it is readily seen that 

S(F - Y'Y = (1 - B2)SS(F - 7)2 

From the alternate way of regarding multiple correlation, we have 
^ S(F' - mN 

® "SS(F-7)Vi^ 

which leads to 2(7' - F)® = ^IS(Y - D®. 



278 


Analysis of Variance: Simple 

Thus the sums of squares have their equivalents m terms of R, 
and consequently they may be computed by way of R. The 
computation of these sums directly would be a hammer-and-tongs 
approach which would mvolve the laborious task of predictmg 
by means of the regression equation the Y for each individual. 

The foiegomg may be assembled m a schematic vanance table, 
like Table 41. As in testing the significance of the ordinary corre- 

Table 4 I Vamancb Setup por Testing Significance op Multiple 
Correlation Coefficient 


Source 

Equivalent 

Squares 


Esti- 

mate 

Regression 

Residual 

2(7' - F)* - E*SS(7 - F)* 

2(7 - 70* “ (1 - i8*)SS(7 - F)* 

m 

«*r 

Total 

SS(7 - F)* 

N-1 



lation coefficient, we set the null hypothesis to the effect that the 
estimate based on the legression sum of squares will differ from 
that based on the residual sum only because of chance sampling 
errors. The null hypothesis implies that, if the entire population 
were measured, the correlation of the dependent vanable with 
each independent vanable would be zero Now, when a sample 
is drawn from such a population, the r's will vary more or less 
from zero with the result that the multiple R will likewise differ 
from zero If the conditions of the null hypothesis hold true, the 
samplmg distnbution of s%/s\ follows that of the F distnbution 
with appropriate degrees of freedom Note that 


F 


_ 2(r - F)V^ 

2(F - DViN - m - 1) 

- F)7w 

“ (1 -R^)X^(Y - F)V(iV - m ~ 1) 

^ R^/m 

^ (1 - R^)/{N -m-1) 













279 


Application to Multiple Correlation 

hence F is a ratio which depends upon R and the dfs If the 
numerator is less than the denominator, we may conclude without 
reference to the table of F that R is insignificant When the 
numerator is the larger, one judges the sigmficance of F by enter- 
ing the table of F with m = m and n 2 = JV — m — 1 Once R 
has been computed, the calculations involved in checking its sig- 
nificance are so simple that an example would be humdrum 

In the chapter on multiple correlation, it was pointed out that 
R as computed tends to have a positive bias, the extent of which 
could be judged by formula (75) This formula can readily be 
derived by the use of estimated residual and trait variances in 
place of actual variances m formula (70) Best or unbiased esti- 
mates lead to an unbiased R, or provide an unbiased estimate of 
the population value of R. Formula (75) gives this improved 
estimate, but the improvement is negligible except when N is 
small, or when m is large relative to N. It should be stressed that 
neither the analysis of variance check on the significance of R 
nor the improved estimate of R allows for the fallacy involved in 
multiple con elation work when from among a large number of 
vanables a few are chosen for inclusion m the analysis because 
they show correlation with the cntenon Such selection tends 
to capitalize on r's which are among the highest partly because of 
chance errors. 

A practical question of considerable importance arises when 
one wonders whether the mclusion of additional variables in the 
multiple regression equation leads to a significant increase m the 
accuracy of prediction or when one wishes to know whether the 
dropping of certam variables results in a significant decrease in 
the amount of variance predicted The inclusion of additional 
variables in the equation always tends to reduce the error of esti- 
mate somewhat and leads to an increase m i?. Can it be said 
that the increase in R possesses statistical significance? 

Let Ri be the multiple based on mi mdependent variables and 
R 2 be the value based on m 2 variables selected from among the mi 
vanables To test the significance of the difference between Ri 
and £ 2 } we take 

(1 - R\)/{.N - mi - 1) 

With wi = mi — and n 2 == iV — mi — 1 If F falls beyond 



280 


Analysis of Variance: Simple 

the 01 point, we can safely assume that the apparent gain in 
using the additional vanable or vanables possesses statistical 
significance 

INTRACaLASS CORRELATION 

Suppose we wish to specify the degree of resemblance of twins 
in terms of a correlation coefficient We have measurements on 
just 1 vanable, and if we attempt to make a scatter diagram we 
are faced with the problem of deciding which member of a pair, 
A or A', to assign to one axis and which to the other. This can 
be resolved by a double entry scheme each pair is entered twice, 
A as X and A' as 7, and then A' as X and A as F An r calcu- 
lated from the double entry (symmetncal) table suffers from a 
slight bias, which may be avoided by usmg the formula given 
below. 

In general, if we have ft families (or groups or classes) with m 
cases per family, the degree of resemblance can be specified by 
the intraclass correlation coefficient, computable by 

-- 

— l)^v> 

in which we have variance estimates for between families (groups 
or classes) and for withm families If F == s^b/s^w is significant 
we have evidence for a significant positive r' Note that if there 
IS no withm-family variation, / becomes unity Note also that 
r' may be negative, but since in practice will rarely be signifi- 
cantly larger than one is seldom confronted with the necessity 
for trying to mterpret a negative intraclass correlation. 

When the number of cases per family varies, the average of the 
nir values is used in place of m in the foregoing formula for r' 
This does not affect the F test as a way of judgmg the significance 
of the correlation 

The distinguishing characteristic of an intraclass correlation 
situation is that we have ft sets of scores on just 1 variable with 
no way of ordering the scores within a set (a sort of interchange- 
ability). It is obvious that r' can be used to describe group resem- 
blance, regardless of how the groups have been defined. 



CHAPTER 16 

Analysis of Variance: Complex 


In the previous chapter an explanation of the fundamental idea 
of the analysis of variance technique was attempted, and applica- 
tions to relatively simple situations were given In general, these 
situations involved the testmg of the significance of the over-all 
variation of the means for several groups, the groups differing on 
the basis of a single classificatory prindple. Such setups are 
sometimes referred to as single variable experiments, by which 
IS meant that groups differing in one known respect are compared 
on a dependent variable. For example, income might be con- 
sidered a variable which is dependent m part on amoimt of educa- 
tion, which accordingly becomes the independent, single variable 
for classifying individuals into groups. Or it might be that the 
classificatory variable is subject to experimental manipulation, 
and we wish to determine whether variations thereof will lead to 
performance or response differences. The Wright experiment 
cited in Chapter 15 is an example of this. 

There are times when it is not only feadble but advisable to 
dedgn the experimental setup so as to make one set of data serve 
for the testing of hypotheses regarding the separate influence of 
two or more independent variables. This type of thing has been 
done for a long time m psychological research wherein it has been 
possible to classify a total group first one way, then another, and 
perhaps a third way. For example, in order to determine some 
of the possible correlates of measured intelligence, we may dassify 
a group of children into urban, suburban, and rural groups; then, 
Ignoring this basis for grouping, we may classify them as to occu- 
pational level of father; or the classification may be by sex or by 
grade location or by age. Such a procedure in which one variable 
is considered at a time is tantamount to the sin^e variable setup, 

281 



282 


Analysis of Variance: Complex 

even thou^ the same batch of data is made to answer questions 
about the effects of different independent variables. 

^ Now it is obvious that, in studying factors associated with 
intelligence, we could make a double classification by classifying 
our cases simultaneously on two of the variables, or a triple classi- 
fication by using three variables, etc. Consider for the moment 
a double dassification based on the three rural-urban categories 
and on sex. This would lead to the assigmng of the cases to six 
groups, each of which would have a mean IQ. Instead of having 
three means for groupings on the basis of the rural-urban charac- 
teristic, we would now have two sets of such means, one set for 
each sex. Instead of two means for the total group dassified by 
sex, we would have three sets of sex means, a set for each of the 
three residence categones 

This type of breakdown and similar ones where percentages 
instead of means are involved were utilized in psychological re- 
search long before the advent of the analysis of variance technique. 
The further breakdown of each sex group for residence status (or 
of residence groups for sex) is made in order to see whether rural- 
urban differences hold for the sexes separately (or whether the 
sex differences are similar for each of the separate residence groups). 
Although researchers were not confined to the single variable 
approach befoie the invention of the vanance technique, they 
were definitely limited in the possible statistical treatment of 
their data. Now that we have the analysis of variance method, 
we have an adequate statistical technique for checking such 
hypotheses as can be formulated concerning the influence of not 
only one but two or more variables The advantages of using 
analysis of variance for such situations may be briefly 
mentioned. 

First, as we have already seen, it provides an over-all test of 
the significance of the difference between two or more means 
when either large or small samples are involved. 

Second, we shall soon see that it leads to a definitely improved 
estimate of sampling error when double or triple or higher-order 
classification is involved. For instance, when the older method 
is used to check the significance of the difference between the two 
sex means for the total group, the determination of the sampling 
error makes no allowance for likely heterogeneity in inteUigence 
associated with residence status. The variance method permits a 



283 


Analysis of Variance: Complex 

refined estimate of error by allowing for variation due to one or 
more variables when one is testing the differences between groups 
classified on the basis of some other variable. 

Third, the variance technique provides a means of testing 
whether the influence of one independent variable on the depend- 
ent variable is similar for subgroups formed on the basis of a second 
independent variable. In a sex-by-residence analysis of IQ's, 
the breakdown of each residence group by sex will likely show 
that the sex differences are not exactly the same for the three 
groups and that rural-suburban-urban differences are not exactly 
alike for the separate sex groups. Such inconsistencies as seem 
apparent from examination of the six cell means may not be real 
for the simple reason that random samphng errors are present. 
Before the development of the variance technique there was no 
way of testmg such apparent inconsistencies, except when each 
classificatory characteristic led to just two categories. 

This last point has to do with what has been termed interaction, 
a concept which is not easily understood Rather than provide 
a detailed discussion now of what is meant by interaction, we wiU 
give a simple illustration. Suppose it has been found that one 
learning method has a distmct advantage over a second method, 
but that, when the data are broken down for two recall intervals, 
the superiority of the first method seems to hold only for those 
with the shorter recall interval. This failure of the fih*st method 
to be consistently better becomes an example of interaction. 
Before concluding that there is evidence for real interaction, one 
needs to apply a statistical test. For such a simple breakdown, 
one could compute the difference between the first and second 
method means, and the standard error of the difference, for those 
with the short recall interval; likewise, for those with the long 
interval; then one could determine the difference between the 
differences and its standard error and therefrom obtain either a 
critical ratio or a i as a test of inconsistency. But, when one thinks 
of a situation with three methods and three or four recall intervals, 
it is immediately obvious that such a simple test cannot be 
applied. 

It is the purpose of this chapter to present the methods of 
analysis to be used when classification into groups is made on the 
basis of two or more variables. These extensions, which are re- 
stricted by the underlying assumptions of normality and homo- 



284 


Analysis of Variance: Complex 

geneity of certain variances, are applicable for either large or 
fimflll samples and are particularly helpful with small samples 
?vhen it seems imperative that we “get the most out of the avail- 
able data.'' 


DOUBLE OR 2-WAY CLASSIFICATION 

Suppose that the individuals (or their scores) are classifiable 
into C groups on the basis of one characteristic or variable and 
into R groups on the basis of a second variable. This would lead 
to a table with RC cells Let us first examme the setup where 
we have only RC scores, i.e , one score for each cell. It is con- 
venient to let Xre stand for the score in the rth row and cth column 
of such a table. The score m the first row (from the top) and 
third column would be symbohzed as X 13 . The general pattern 
of labeling the scores is set forth in Table 42, which also includes 
along the margins a symbol for the several possible row and column 
means. Note that the first subscript identifies the row and the 

Table 4^. Schema por Labeling Scores and Means for Groups, Double 

Classification 


m 

■ 

2 

3 

c 


m 

1 

Xn 

X 12 

Xis 

Xu 

Xic 

h. 

2 

X2I 

X22 

X2S 

Xu 

X2C 


3 

X 3 I 

^82 

■3^88 

Xu 

Xic 

Iz. 


Xrl 

Xrf! 

XrS 

x„ 

XrO 

Xr. 

H 

Xri 


Xrz 

Xbo 

Xrc 

Xb 

■ 


S2 


lo 


X 


second the column to which a score belongs The scheme used 
in denoting means should be grasped. Thus X 2 is the mean for 
the second column, whereas Z 2 . is the mean for the second row. 
The “dot” in the subscript indicates the direction of the summing 
for computing a mean— to get X 2 we sum Xr 2 scores with r 
taking on values running from 1 to R. 












Double or 2-Way Classification 


285 


The deviation of any score, X^c, from the total mean can be 
expressed m terns of the deviation of its row mean from the total 
mean, (X^ — X)^plus the deviation of its column mean from 
the total mean, (X c X), plus a sort of remamder term whictt 
represents an individual variation over and above that due to 
the groups to which the score belongs. To secure an expression 
for liQS term, we note that by definition the term must be the 
part of the score deviation (from the total mean) left over after 
the sum of the two parts specified above have been subtracted. 
Accordingly, we have 

iXrc - J) - [(J, - r) + (r . - D] 

which Amplifies to 

(X„ -Xr - I C + ^) 

We may therefore ■write the following identity: 

(X,«-X) = (X -X) + (X.c-^) + (X,,-X, -Xe + Z) 

With r running from 1 to R, and c taking values from 1 to C, 
there will, of course, be RC mdividual deviations. We need the 
sum of their squares, which sum will involve the squares of the 
three parts, plus three cross-product terms that can be shown to 
vanish when summed. It may be instructive to indicate how the 
sum of squares for all RC scores can be set up. Suppose we begin 
by writing the squares of the deviations for scores in the first 
column. Each of these squares will involve cross-product terms, 
which we shall here ignore except for a plus sign to indicate their 
existence. We have for the first-column scores: 

(Zu-X)2«(Xi -X)2 + (Xi-X)2 + (Xii-Xi + 

(X21-X)2«(X2 -X)24*(Xi-X)2 + (X21-X2 -Ji+X)2+.. 

(Xrt-X)2 = (X. -X)2-l-(Xi-X)2-|-(Xrl-Xr 

(Xbi - X)2 « (Xb. - + (X 1 - X)2 + {Xbi-Xb ~ X i + X)^ + . 

The summing of these squares of deviations for scores of column 1 
involves R cases, i.e., r runs from 1 to R, hence we need a sjnoibol 

r 

which denotes this fact. Let us use 2 for this purpose. Note 
that the second term on the right is constant for all B scores, which 
permits us to replace the summation sign by 22. 



286 


Analysis of Variance: Complex 

The sum of the first column squares, and by analogy the sums 
for the other columns, can be written as: 

1st col : 

i(Xtl - 2)^ = 2(Zr - 2)* + 1 - X)* + -2r - 2.1 + X)* 

2nd col : 

-2)^ = i(2r -2)’‘+R{Xi~ 2)^ + i{Xrt -2r - 2-2 + X)* 
cth col.* 

2(X„ - X)* = S(X, - 2)^ + B(2c-2)* + S(Z„ -2r - 2.C + J)* 
Cth col : 

We may now sum over the C columns, and for the results we will 
need double summation signs. Since the first nght-hand term 
does not vary from column to column, its sum is merely C times 
its value. The second right-hand set of terms involves a constant 
times a variable; hence the constant R comes from imder the 
summation sign. Finally we have the following expression for 
the sum of squares for the RC scores. 

SS(Z„ - Tf = CS(Zr. - X? + Rhx c - Xf 

+ 2S(x„ - r, - X * + j)2 (101) 

The reader who is worried about whether the cross-product 
tenus really vanish should note that for the cth column the product 
term 

sex,. - X)(X 0 - X) = (X c - X)S(Xr - X) 

r 

vanishes because 2(Xr — JC) = 0. The other two cross-product 
sums have as one factor the remainder or residual term; we have 
already had examples of a general principle that product terms 
involving residuals vanish. 

From formula (101) we see that the total sum of squares can 
be broken into three additive components: between row means 
with jB — 1 degrees of freedom, between column means with df 
of C — 1, and a remainder. The degrees of freedom for the last 
part can be ascertained by a principle analogous to that used for 



Double or 2-Way Classification 287 

getting the df for contingency tables The marginal means 
constitute restrictions on the deviation score entries in the rows 
and columns — when deviation scoies for (R — 1)(C — 1) cells 
are filled in, the rest of the entries become fixed; hence df 
= (J? — 1)(C — 1) Note that the dfs for the 3 parts sum to 
the df for the total sum of squares or RC — 1 
Dividing the 3 sums of squares by their rf/’s leads to 3 vari- 
ance estimates, s^r for that based on rows, s^e for columns, and 
s^e for that based on the remainder, sometimes called error, sum 
of squares. We have 2 null hypotheses that the row means are 
chance variations from 1 population mean, and that the column 
means are also variations from 1 population mean. As in the 
simpler situation, if the estimate based on rows is larger than 
expected on the basis of chance, it follows that there are real dif- 
ferences between the population means for the gioups defined 
by the rows; likewise, for column means. 

In testing the significance of either of the 2 between-groups 
variances when the RC scores belong to RC individuals, we use 
the remainder variance estimate as the denominator of the F 
ratio This involves an assumption, to be discussed below under 
the heading “Choice of error teim,” p. 303. For testing the varia- 
tion of row means, we have F == s^r/^^e with rii = R — 1 and 
712 = (R 1)(C' — 1) For column means, F = s^c/s^e with 
Wi = C “ 1 and ^2 = (/? — 1)(C — 1). If an F so defined hap- 
pens to be less than unity, we know” at once without reference to 
the table for F that the variations of the given means are insig- 
nificant. Note that, since the error variance used in the denom- 
inator is a residual after the parts of the total associated with 
between-row and between-column variations have been sub- 
tracted, it follows that we are using as our error term a variance 
which has been freed of the influence of heterogeneity with respect 
to the 2 olassificatory variables being investigated 
For many situations involving double classification, it would 
seem that the method just outlined would be definitely limited 
in usefulness because no provision has been made for increasing 
the size of the sample except by using finer grouping on one or 
both of the independent variables. Finer grouping would be 
possible, though not always feasible or desirable, for some classifi- 
catory variables, such as degree of illumination or amount of 



288 


Analysis of Variance: Complex 

education or size of type, but for other bases for forming groups 
there are definite limits on the number of groups. For example, 
in the study of reaction time the number of possible groupmgs for 
Sense modahty is limited. Actually, the number of cases can be 
increased by having additional individuals assigned to each of the 
RC cells Before takmg up this needed modification of the setup, 
we shall discuss certaiu specific situations where the scheme as 
presented is of practical use. We are not ignoring the possibility 
that sometimes RC cases are enough for testing hypotheses even 
when both 12 and C are as small as 4 or 5. 


SIG3OTFICANCE OF THE DIFFERENCES BETWEEN CORRELATED 

MEANS 


Suppose that the RC scores are for 12 individuals workmg under 
C different conditions. The mean of a row would be for an indi- 
vidual, and the mean of a column would be for a specified condi- 
tion. Let us consider the limiting case of C = 2 The between- 

C 

columns sum of squares, f2S(r e — -T)®, may be written as 

iBCr.i- 2)2 + ie(J 2-^)2 

which we have already shown (p 260)reducesto(i2/2)(2.i — 2 2 )^ 
or to a function of the difference between the two means. 

Let us neict examine the remainder or error term. If we turn 
back to p. 286, where we summed over columns, we readily see 
that the remainder sum can be expressed as 

S(Xrt-2, -2i + 2)2 + S(Xris-2, -22 + 2)2 


in which the c of formula (101) has the eiqilicit values of 1 and 2. 
Now the mean of any row, say the rth, is merdy the mean of 
C = 2 scores; Le , Xr = (X,i + Xr 2 )/ 2 , and the total mean must 
be the average of the two column means, or 2 = (2 1 + 2 2)/2 
Making these substitutions, we have 


i(x, 


rl 


2rl + ^r2 _ — 2 1 + 2 2\ ^ 

2 ^ 2 / 


+ s(x,2- — 


+ 2 


rS 


- 2 . 


2 1 + 2 2V 





The Di£ferences between Correlated Means 


289 


which simplifies to 

iSCZrl - - X 1 + J -Xri-^i + Xi)^ 

'* 

These two terms become identical when we change the signs 
within the second parentheses, which change is permissible smce 
the square of a function is the same as the square of its negative, 
e.g , (a)^ = (— o)^ Hence we have 

i-2[(Zrt-X,2)-(Xi-J2)]2 

Now the first parentheses term is the difference between any 
individual’s two scores, say Dr, and the second is the difference 
between the two column means, which difference it will be recalled 
is the same as the mean of the differences, D. We have finally 

r 

the remainder sum of squares as |Z(2)r — 25)®, or one-half the 
sum of the squares of the difference scores about the mean dif- 
ference 

The F for comparing two column means becomes 





B-1 

with rii = 1 and n 2 = 12 — 1. This reduces to 

^ (r 1 - r 2 )^ 

i(Dr - Bf 

R{R - 1 ) 

which the reader will recc^ize as ^ for comparii^ the difference 
between means based on sets of correlated scores with the standard 
error of the mean difference estimated by formula (28), p. 108. 

We have seen in Chapter 6 that in testing the difference be- 
tween the means of counted scores we can, for the large sample 
dtuation, determine the needed sampling error either from the 
distribution of differences between paired scores or by means of 
the standard error of the difference formula with the correlational 



290 


Analysis of Variance: Complex 

term included. The important thing to note is that the analysis 
of variance technique provides a method for testmg the signifi- 
cance of the difference between two or more means based on sets 
of correlated scores. The scores may be correlated either because 
they are based on the saim individuals workmg imder C condi- 
tions or having C trials on some stimt, or because siblings or litter 
mates are involved (each of the C groups containing one case from 
each of B families), or because we staited with R sets of matched 
mdividuals, one from each set being assigned to the several C 
groups. After and only after it has been found that the F for the 
C column means is significant are we justified in using the critical 
ratio or t technique to test the significance of the difference be- 
tween any two of the C means 

The F just discussed has to do with column means What of 
the row means for the given setup? The means of the R rows 
represent the mean performance of each of the several individuals, 
and a test of the significance of the estimate of variance based on 
the between-row sum of squares becomes a test of the significance 
of mdividual differences Since it is known that individuals do 
differ on practically all psychological vanables, such a test is 
usually a trivial test of the obvious, and hence it is seldom needed. 
We may, however, have the situation m which we wonder whether 
individual variation is significant m the light of known measure- 
ment or response errors. To this question we now turn. 

RELIABILITY OF MEASTJREM^^ 

Suppose the scores in each row represent either the perform- 
ance of an individual on different forms of a scale or C measure- 
ments for a given variable. The column means would be the 
means for the forms or successive sets of measurements, and the 
test of the significance between column means would be a test of 
the difference between the several form means or of the difference 
between the means for the C successive sets of trials. For form 
means or for trial means, F = s^c/s^e, as outlined above, provides 
an over-all test of the significance of these correlated means. 

In order better to understand the meaning, in this situation, of 
an F based on let us again take the limiting case of C = 2; 
e g., suppose two forms of a test have been admimstered to R 
individuals. The algebra is simplified and an interesting, clear- 



Reliability of Measurement 


291 


cut result eoaei^es if we assume that the two forms yield exactly 
the same means, i e , that X i = X 2 = X. Then the remainder 
sum of sqxiares, 

becomes 

^kXrc-^rf 

This can be written without the double summation sign as 
i(Xrl-jlr? + kXr2-^r? 

Since the mean of each row is simply the average of two scores, i e , 

Xrl + Xr2 
' 2 

the above can be WTitten as 


r 

V 



Xrl + A' 




which by a little algebraic manipulation reduces to 

ikXrl “ Xr2f 

Since we have assumed that the form means are equal, the dif- 
ference scores in this expression will have a mean of zero. There- 
fore, if we divide the sum of the squared diJfferences by 22, the 
number of individuals, we will have the variance of the distribu- 
tion of differences, which we symbolize by 
It follows that 

- Xr2? = 


Now it can bo shown by ea^ algebra (see pp 83-84) that 
v*DZ> = + <r®2 - 2ri2<riff2 

in which the v’s are measures of variation for forms 1 and 2 respee- 
tively, and ri 2 is the correlation between forms If we make the 
usual assumption that the two forms are so nearly comparable 
that we can replace <ri and 0-2 by <r, we have 

=0^ + 0^ — 2ri2<ro’ = 2ff®(l — ri2) 



292 


Analysis of Variance: Complex 


Then (J 8 / 2 )o^D 2 > becomes J2cr^(l — ^ 12 ). But ri 2 defines, and is, 
the reliability coefficient, and hence o^(l — ri 2 ) is the error of 
iqpLeasurement variance, o-^e, so that we finally have the remainder 
sum of squares equal to 

Thus, under our simplifying assumptions of equal form means 
and equal form variances, assumptions which are usually made 
in connection with test reliability, we see that the remainder term 
is directly associated with the familiar error of measurement 
variance. The remainder term as actually computed from the 
sum of squares includes an adjustment for possibly differing form 
means but no allowance for differing form variances, so it will 
not exactly equal Ra^e The remamder sum of squares does, 
however, lead to an estimate of the error of measurement vari- 
ance, not only in the situation where we have an analysis based 
on two forms but also where three or more forms are involved; 
accordmgly, when we test the significance of the variance for 
between-row? means, we are actually asking whether the individual 
differences are significant m light of the variability due to measure- 
ment errors. 

Since the reliabihty coefficient is a function of the error variance 
relative to the observed trait variance, it follows that a significant 
between-individuals variance is evidence for statistically signifi- 
cant reliability. But one cannot conclude from this that the test 
or instrument possesses satisfactory rehability since coefficients 
as low as .20 or .30 or even .10 can be statistically different from 
zero if 22 is sufficiently large. The author does not recommend 
this approach to the question of the reliability of measurement 
for the simple reason that it is more important to know how reliable 
a test is or how near its rehability approaches unity than to know 
only that it zs rehable in the sense of yielding a coefficient signifi- 
cantly different from zero. 

This possible application of the variance technique, however, 
points up the fact that it is sometimes meaningful to speak of the 
remainder variance as “error” variance. In a wider sense, the 
remainder variance can be thought of as the imcontroUed variar 
tion which contributes to the variation of the means of the groups 
being compared. Now a httle reflection leads one to the conclusion 
that the sources of error in research are many and varied. Some- 
times instrumental and/or measurement errors loom large, some- 



Reliability of Measurement 


293 


times the error associated with the sampling of individuals is 
paramount, at other times the intraindividual variation is sizable, 
and frequently if the sources of vanation are unknown the term 
experimental error is used as a catchall. When a particular vari- 
ance estimate is referred to as the error vanance to be used as the 
denominator of the F ratio, the “error” may be any one of or a 
combination of the many types of error In this sense, the variance 
estimate based on the remainder sum of squares may be the error 
variance even for those situations where we have classifications 
into R groups rather than as R indivuduals, but as will presently 
be seen the term which we are now calling the remainder may not 
always bo the one to utilize as “error ” The vithin-groups vari- 
ance estimate of the last chapter was an “error” variance for 
testing the tignificonce of the between-groups variation. In more 
complex setups in tlio analsrtis of vanance, judgment is required 
m chooang the appropriate error term. 

Parenthetically, it might be pointed out that the test reliability 
problem can be tackled by the within- and between-groups vari- 
ance estimates. Each person for whom we have two or more, say C, 
measurements yields a set of C scores, and the vanation within 
such a set is partly a function of measurement errors; hence the 
over-all vrithin-gi’oups (intraindividual) variance estimate be- 
comes an error term by which one may test the sigmficance of the 
between-groups (between-individual) variance. Note that this 
within-groups or intraindividual approach will lead to an estimate 
of the error of measurement variance wiOumt an adjustment for 
postible differences in form means, and that it does not permit a 
test of the significance of the difference between form means, 
which is posrible when the double classification scheme is utilized. 
Either of the two methods for determimng whether the reliability 
is sufficient to possess statistical tignificance is applicable for an 
over-all evaluation of C forms or C succestive measurements or 
trials. With C forms and R individuals, it is of interest to make a 
comparative layout of the two approaches, that based on the 
double clastification scheme of this chapter and that based on the 
tin^e dastification procedure of the last chapter. Table 43 con- 
tains the essentials. 

Note that both F = s*r/s*e and F = provide tests of 

the tignificance of rdiabihty by way of the rignificance of indi- 



294 


Analysis of Variance: Complex 
Table 4S Two Approaches to Test Reliabilitt Problem 


Via Double Classificatioii 


Variance 

Estiinate 



Via Single Classification 

Variance 

Estimate 

df 


JfiS - 1 


R{C 1) 


vidual differences The df for the estiinate s^e is C — 1 smaller 
than that for a trivial difference in the practical situation 
where C is seldom more than 2 or 3, and R is usually 100 or more, 
rarely as small as 25 or 50. Both s^e and constitute estimates 
of the error of measurement variance, but because of the adjust- 
ment for differing form means, will be smaller than Whether 
either of these estimates is useful as indicating precisely the meas- 
urement error for a particular form depends upon the extent to 
which the standard deviations for the several forms are similar. 


COMPOTATIONAL ILLUSTRATION 

The required computations for testmg variation between column 
means and between row means will now be set forth. It makes no 
difference m the computational procedure whether we have RC 
individuals classified into R groups one way and C groups another 
way or R individuals with C scores each or R sets of C individuals 
matched or RC scores for just 1 individual 
The computation of the required sums of squares involves an 
extension of formulas (97), as follows 

fs(X„-r)2 = -^[i2Csiz2„- (SSX„)2] for total (102o) 
RC 

for colimuis (102&) 

Cicr,. - J)® = [B2(SX„)® - (SSX„)*] for rows (102c) 











Computational Dlustration 295 

The sum of squares for the remainder can be obtained by sub- 
tractmg the sums for between columns and for between rows 
from the total sum of squares. Formulas (102) may look forbid- 
ding at first, but actually the sums based on raw scores are easily 
secured by following a plan on the work sheet Sum each low, 
and write the sums on the light-hand margin; sum each column, 
and wnte the sums along the bottom margin. Summing down 
the nght-hand margin gives the total sum, and summing across 
the bottom margin should give the same total sum. Square all 
scores and sum to get the first sum in (102a) , square all the right- 
hand margin sums and then sum to get the first part of (102c) ; 
square all the bottom margin sums and then sum to get the first 
part of (1026). 

The student may do well to sit down at a calculator and per- 
form these operations with the scores in Table 44, which contains 


Table 44 Data for Visual Acuity, 4 Individuals, 3 Distances 
(Monocuiar, Vern'ikr Method, C!odbd Scores) * 

Distance (m Meten.) 


S e 


Subjects 

5 

10 

15 

^Xrc 

Xr. 

1 

13 

29 

17 

59 

19 7 

2 

4 

9 

19 

32 

10 7 

3 

8 

30 

37 

75 

25.0 

4 

9 

27 

53 

89 

29.7 


34 

95 


255 


S.o 

8 5 

23 7 

31.5 

21.2 « X 



ZSXre - 255 


X(^Xrc)^ - 18,051 


SSX*„ - 7709 


SCSA're)* - 26,057 


* From Walker, K L , Factors in rentier acuity and distance dtscnmtnaiton. 
Doctoral Dissertation, Stanford University, California, 1047 


visual acuity data on 4 (= /2) mdividuaJs for 3 (= C) distances 
of the stimulus from the eye Casual examination of the table 
indicates that acuity measures are influenced by distance. Do 
the roft«.ng for the 3 distances differ significantly? 

The required sums are also included in the table. Substituting 
these in the above formulas gives: 



296 Analysis of Variance: Complex 

•3V[12(7709) — (255)^] = 2290 25 for the total sum of squares 

•3^[3 (26,057) — (255)^] = 1095 50 for between-columns sum of 

squares 

■jV[ 4(18,051) — (255)^] = 598 25 for the between-rows sum of 

squares 

Subtracting the sum of the last 2 from the total gives 596.50 as 
the remainder sum of squares. 

These results are assembled m Table 45 along with the dfs and 
the variance estimates For the influence of distance we have 


Table 45 Vabiance Table for Data op Table 44 


Source 

Sum of 
Squares 

(V 

Variance 

Estimate 

Distance 

1095 50 

2 

547 75 

Subjects 

598 25 

3 

199 42 

Remainder 

596 50 

6 

99.42 

Total 

2290.25 

11 



F = 547.75/99 92 « 5.51, which for ni == 2 and n 2 == 6 is sig- 
mficant at slightly better than the P = 05 level (additional data 
in Walker's dissertation leave no doubt — distance does have an 
effect). This is a situation in which experimentally induced 
differences are so large that they can be demonstrated with only 
4 cases. 

DOUBLE CLASSIFICATION WITH MORE THAN ONE SCORE 

PER CELL 

Suppose that we have m scores in each cell of schematic Table 
42. This would lead to a mean for each cell, and about each such 
mean we would have the variation of m scores. The mean for 
the rth row would be the mean of all mC scores m the row, or the 
mean of the C cell means of the row, the mean of the cth column 
would be the mean of the mR scores in the column, or the mean 
of the ceU means in the colunm; in the remainder term, previously 
defined as (Xrc - J c + ^), we would replace Xre by 

Xrc The total sum of squares for all mRC scores would indude 
a between-column, a between-row, and a remainder component, 
plus an additional part which woidd involve the variation wiMn 



Double Classification — ^More Than One Score per Cell 297 

cells about the cell m^ans. A convenient label for this new part 
would be SS(Xrc “• ^rc)^, in which it is understood that there 
are m such deviations in each cell. A more precise notation would 

t r c 

be SSS(Xtrr — ^rc)^, in which X^rc is the tth score in the ce3 
involving the rth row and cth column 
The variance table would take on the ionn indicated in Table 
46, in which the term “remainder” has been replaced by “inter- 


Table 4^. Variance Schema for Double Classification with m Scores 

PER Cell 


Source 

Sum of Squares 

dj 

Variance 

Estimate 

Rows 


R - 1 

A 

Columns 

r - 5)* 

C- 1 


Interaction 

ni:ss(r„-rr -r. + r)* 

(R - l)(C - 1) 


Within cells 

:ss(x„ - 

mRC - RC 

0 10 

Total 

SS(X„ - 5)* 

mRC — 1 



action.” Note that the first 2 sums of squares are simply m 
times the corresponding sums for 1 score per cell, and that the 
d/’s for these sums and for the one corresponding to the remainder 
sum are not changed. The df for the within-cells sum depends 
upon the fact that there are m — 1 degrees of freedom in each of 
the RC cells, which gives RC{m — 1) — mRC — RC as the df. 
We now have 4 estimates, of variance. 

This simple modification of the setup for the analysis of variance 
leads to 2 definite advantages We can increase the precision or 
dependability of our results by basing the analysis on more scores 
or cases, and we can test the possible dgnificance of the inter- 
action component. Before we discuss the first advantage, it is 
necessary that we consider the question of possible interaction, 
the exposition of which is facilitated by an example, which will 
also serve to illustrate the required computations. 















298 


\nalysis of Variance: Complex 

The computational formulas are extensions of previously used 
formulas. A SX and is calculated for each cell. Summing 
the RC SX^ values gives SSX^ as the sum of all the mRC squared 

c 

i^ores S umming the SX values m each row gives SXrc, and 

r 

summing the SX values in each column gives SXrc. These be- 
come sums along the maigins, which maiginal values sum down, 
and across, to the total sum of the mRC scores, 'SZXrc The 
sum of scores m any particular cell will be symbolized as SXrc 
The formulas are 

Total sum of squares = — - [mBCSSX^rc — (SSXrc)^] (103a) 

mRC 

Ire 

Between-rows squares = [BSCSXrc)® — (SSXr*)®] (103&) 

mRC 

Between-cdumns squares 

= - (2S^rc)^ (103c) 

mRC 

Within-ceUs squares = - [mSSX^c - S(S^rc)^] (103d) 

- m 

The interaction sum of squares is obtamed as the remainder when 
the numerical values of formulas (1036cd) are subtracted from 
the total sum of squares 

Table 47 contains data on learmng with 2 variations as to prac- 
tice sessions and 2 variations as to rest interval between trials 
For each combmation of conditions there are 20 (= m) cases 
The scores are recorded in a 2 by 2 or 4-cell table Table 48 is a 
work-sheet layout in which are recorded sums of scores, sums of 
squared scores, and means, for cells and for the margins The 
lower nght comer contains values for the total group of 80 cases. 
For the sums of squares (of deviations) we have the following; 

Total: ^[80(7836) - (735)^] = 10821875. 

Rows.^[2(4362 + 299^) - (735)^] = 234.6125. 

Columns; -^[2(3412 + 394^) - (735)^] = 35.1125 
Within cells- i^[20(7835) - (217^ + 219^ + 124^ + 175^)] = 
782.4500 

Interaction- 1082 1875 - (234 6125 + 35.1125 + 782.4500) = 
30 0125. 



Table 47, Coded Learning Scores (Sxjm of Scores on 29th and 30th 
Trials) for Kobrth Pursuit Rotor * 


Rest 

Practice Sessions 

Interval 

5(M T W Th F) 

3(MWI0 


9 

14 

6 

10 

8 

10 

11 

14 


10 

15 

10 

11 

9 

7 

9 

10 

[3 minutes 

14 

17 

10 

11 

9 

12 

13 

14 


10 

7 

8 

16 

12 

13 

7 

17 


12 

8 

14 

6 

9 

12 

8 

15 


2 

6 

1 

9 

11 

12 

9 

7 


5 

9 

2 

11 

9 

6 

11 

9 

1 imnute 

14 

1 

1 

8 

6 

8 

11 

12 


14 

4 

11 

5 

9 

7 

4 

10 


6 

8 

2 

5 

13 

6 

7 

8 


Data from Renshaw, M. J., The Effects of varied arrangements of practice 
and rest on proficiency in the acquisition of a motor slzllt Unpublished Doctor's 
Ih^rtation, Stanford Universitv. California, 1947. 

Table 48 Sums and Means for Data op Ta.blb 47 


Rest 

Interval 

Practice Session 

Totals 

5(MTWThF) 

3(M W F) 


XXii -217 

SZis -219 

SXi, 

«436 

3 minutes 

- 2643 

= 2647 

SX*!. 

= 5090 


- 108600 

Xn - 109500 

h. 

« 10 9000 


SZsi - 124 

HXn - 175 

2jX2o 

«299 

1 minute 

SZ*si “ 1102 

Si*22 » 1643 

2Jp2c 

«2745 


In = 6.2000 

In =87500 

X2. 

« 7.4750 


HXtI -341 

2X^ -394 

SSXr, 

«736 

Totals 

Si*,! - 3646 

SZ®rt = 4190 

ssx*„ 

«7835 


X.i =8 5260 

X.J -9 8500 

I 

« 9.1875 


299 













300 


Analysis of Variance: Complex 

The interaction sum of squares can also be calculated by direct 
substitution into the definition formula of Table 46, which will 
involve RC quantities to be squared, summed, and multiplied by 
W'. We have 

(10 85 - 10 90 - 8.525 + 9.1875)^ = (.6125)2 

(10 95 - 10 90 - 9 85 + 9.1875)2 = (-.6125)2 

(6 20 - 7 475 - 8 525 + 9.1875)2 = (- 6125)2 

(8 75 - 7 476 - 9 85 + 9.1875)2 = ( 6125)2 

which when added and multiphed by 20 lead to 30 0125, or the 
value obtained by subtraction. 

Any reader who is surprised that the above 4 values involved 
m computing the interaction sum of squares directly are numer- 
ically equal should ponder the fact that for the given situation 
the d/for the interaction term is (2 — 1)(2 — 1) or 1. 

Actually, the easiest way to compute the interaction sum of 
squares for a 2 by 2 table is to work with the 4 cell sums of scores. 
The formula is 


^ (SZn + SX22 - 2X12 - 2X21)2 
4m 

For this problem we have 

^(217 + 175 - 219 - 124)2 = -^^(49)2 = 30.0125 

The sums of squares and resultmg variance estimates are brought 
together in Table 49 We have 4 variance estimates which 

Table 49 Analysis op Vaeiancb por Pursuit Learning 


Source 

Sum of Squares 


Variance Estimate 

Rest interval (rows) 

234 6125 

n 

234 6125 

Sessions (columns) 

35 1125 

H 

35 1125 

Interaction 

30 0125 

n 

30.0125 

Individual differences (within 
cells) 

782.4500 

76 

10.2954 

Total 

1082 1875 

79 












Illustrations of Interaction 


301 


for the given situation are all estimates of the same population 
variance under the null hypothesis conditions: no row effect, 
no column effect, and no intei action. It is appropriate for this 
table to use as the denominator of F to test the row, the 
column, and the interaction effects. We have for interaction, 
Frc = 30.0125/10 2954 = 2.92, which falls short of the F of about 
4 0 required for significance at the 05 level. This indicates that 
the apparent failuie of the 4 cell means to be consistent, in either 
direction, with the marginal means (or with each other) is 
attributable to chance fluctuations For this particular problem 
the chance fluctuation is the sampling of individuals (plus a rela- 
tively small component having to do with errors of measurement). 

Next consider the effect on pursuit learning of varying the rest 
interval and varying the sessions. For sessions we have Fc = 
35 1125/10.2954 = 3 41, which is not large enough to lead us to 
reject the null hypothesis; but since nonrejection of the null hy- 
pothesis does not prove the h^^pothesis, we can conclude only that 
the effect, if it exists, is not large enough to be demonstrated by 
the number of cases used. The beiween-rows or rest-interval 
effect is highly significant as judged by Fr = 234 6125/10 2954 = 
22 79, which is double the F needed for the .001 level of signifi- 
cance Now the fact that the interaction is not significant permits 
us to conclude that the rest-interval effect is similar for 5 sessions 
and for 3 sessions per week. If the interaction had been signifi- 
cant, we would need to qualify our conclusion about the effect 
of the rest interval. 

ILLUSTRATIONS OF INTERACTION 

Reference to actual examples of statistically significant interac- 
tion may help clarify its meaning. For this purpose we shall again 
use some data on visual acuity from the expenment by Walker.* 
For visual acuity (low score, bettor acuity) by 2 methods of meas- 
urement (depth and vernier) with binocular and monocular vision, 
we have means as given in Table 50. The marginal means are 
markedly different, and it is readily seen that the cell means (each 
based on 108 determinations) are not consistent with the margmal 
values. The ratio of 1 to 3 for the binocular vs. monocular 

* Walker, E. L., Factors in vermer acwUy and distance discnmmatton^ Un- 
published Doctor's Dissertation, Stanford University, California, 1947. 



302 


Analysis of Variance: Complex 


Tdble 50 Visual Acuity Interaction of Type op Measurement with 



Eyes 

Depth 

Vermer 

Total 

Bmocular 

08 

1 07 

57 

Monocular 

24 

1 50 

.87 

Total 

16 

1 28 

.72 


means of 08 and 24 varies from the 2 to 3 ratio for the means on 
the right-hand margin, and the ratio of near 1 to 13 for the values 
of .08 and 1 07 differs from the 1 to 8 ratio of 16 to 1 28 In other 
words, the amount of difference between binocular and monocular 
acuity depends upon the type of measurement 
One vanable investigated in the experiment was the distance 
of the stimulus from the subject Since distance is an ordered 



Distance Distance 

Fig 16 Simple mtei action, eyes Fig 16 Simple interaction: meas- 
by distance ures by distance. 


vanable, it is possible to picture the interaction by making a 
graph, with acuity as the ordinate and distance along the x axis 
Figure 15 shows the relationship of acuity (average of the 2 types 
of measures) and the 3 distances used Note the difference be- 
tween the 2 curves — ^the sigmficant interaction for eyes and dis- 
tance actually means that the 2 curves are different This lack 
of parallel behavior of curves is more strikmg in Fig. 16, which 
illustrates the interaction of measures with distance, for bmocular 
and monocular combined In this study there was also a signifi- 
cant variance for the subjects by distance interaction, from which 



Choice of Error Term in 2-Way Classification 303 

one concludes that the relationship between acuity and distance 
varies from person to person (see Fig 17) 

Walker also investigated the effect of stimulus rod width and 
size of aperture. A plot of the results for acuity (ordinate) against 
rod width (abscissa) for 3 apertures (A large, B medium, C small) 
is given in Fig. 18 as another possible example of interaction 
except that this time the apparent mteraction is so slight as not 
to possess statistical significance This bemg the case, it can be 
said that the effect of lod width is independent ot apeiture (and 

250 
200 
150 
100 
50 
0 

Fig. 


vice versa). Contrast this with the possible conclusion, based on 
a highly significant F, that distance affects acuity When we 
note the interaction effect depicted in Fig 16, we see that such a 
conclusion does not hold at all for the depth measure Thus, sig- 
nificant interaction always calls for a qualification, sometimes 
drastic, regarding a main effect It is entirely possible for an 
effect to be in opposite directions for different conditions, and the 
over-all effect need not be significant for this to occur. 

CHOICE OF ERROR TERM IN 2-WAY CLASSIFICATION 

Now that we have learned something about the meaning of 
interaction and have had a couple of examples which illustrate 
the computations and the way hypotheses can be tested, we must 
specifically consider an as yet unmentioned question: Which vari- 
ance estimate is the correct one to use as the error term, that is, 
as the denominator for the F ratio? The answer depends upon 



Distance Rod width 


17 Simplo iriteiiiction: dis- tS .Nonwgmficant mterac- 

tance by subjects tion* aperture by stimulus rod 





304 Analysis of Variance: Complex 

the mathematical model that is appropriate for a given situation. 
Three models have been set forth by the mathematical statisr* 
ticians f These are referred to as the components of variance 
model, the fixed constants model, and the mixed model. Let us 
define these for the 2-way classification setup. 

We have the components of variance model when both classifica- 
tions involve sampling Such would be the case when rows stand 
for individuals and columns stand for judges (each of whom has 
rated each mdividual) The individuals and the judges are re- 
garded as random samples from normally distributed populations: 
normal distribution of mdividuals with respect to the ratings and 
normal distnbution for the rating charactenstics of the judges. 

We have a fixed constants model when no random sampling is 
involved so far as the bases of the classifications are concerned. 
Such IS the case when the classifications depend upon such things 
as size, distance, time interval, degree of illumination, etc ; or on 
such unordered things as sense modality, sex, method, diagnostic 
group, etc The setup m Table 47 involves the fixed constants 
model; neither the rest intervals nor the sessions were chosen at 
random 

We have a mixed model when one basis of classification involves 
sampling and the other fixed constants Table 44 illustrates a 
typical mixed model, typical in that one basis of classification is 
individuals 

Each of the 3 models calls for precisely the same breakdown of 
the sum of squares and of the degrees of freedom, and each leads 
to 3 variance estimates plus a within-cells estimate in case we 
have more than 1 score per cell. It should be noted that the 
within-cells scores can stand for 2 kinds of replication. We might 
have replication in the sense of having carried out the expenment 
with more than 1 person in each cell (but with different persons 
from cell to cell) as in Table 47, or we might have a replication of 
measures on the same person or persons. Thus m Table 44 we 
could have m measures per person under each of the C conditions 

t Some of the confusion m textbooks (mcluding the first edition of this one) 
regardmg the choice of the error term is likely due to the fact that they were 
written before the models were explicitly stated. A complete statement of the 
models is given m E G Mentzer’s 1953 pamphlet “Tests by the analysis of 
vanance,” prepared under the direction of Dr. Paul R Rider and available 
as WADC Technical Report Wnght Air Development Center, Wright- 
Fatterson Air Force Base, Ohio 



Choice of Error Term in 2-Way Classification 305 

(We are not here concerned with replication in the sense of a 
repetition of the entiie expeiiment by another investigator.) 

Actually, for the working statistician the precise formula for 
the possible mathematical models is not nearly so important as 
the deductions therefrom regarding the meanings of the several 
variance estimates Earlier (p 253) we attempted to explain 
the meaning of the variance estimate, Perhaps the student 
should leview the steps that led us to say that is an estimate 
of This we symbolized by an arrow, meaning “is 

estimate of.'^ Another way of saying this is the expected value 
of s^b is x„‘ 

The general model for 2-way classification may be written as 

(Xrck — X) = Ofr + + (oL0)rc + &rck (104:) 

in which the deviation of a score from the over-all population 
mean is thought of in terms of a row contribution, a, a column 
contribution, jS; an interaction effect, (a0); and a normally dis- 
tributed random error part, Crcfc- The subscript k indicates that 
we have replication, m scores per cell, with k taking on values 
1 - • • m, but the m scores in each cell are mdependent of the scores 
in all other cells Both a and iS, and also (aj8) are expressed in 

r 

deviation score form, i.e , possess the property that Sar = 0, 

c re 

2]8c = 0, and 22(aj3)rc — 0. 

For the fixed constants model we replace a and phy A and B, 
thus 

(Xrek — ^) = Ar Bf {AE)re + Crefc (105o) 

For the components of variance model we replace a and by 
a and h, thus 

{Xrek ~ -T) = Or + + (0&)rc + ^rck (1056) 

and the mixed model can be wntten as (with columns standing 
for fixed constants) 


(■X^refc — ^ = Or + -Be + (oB)re + Srek (105c) 

The Or, be, (a6)re> aud (ctB)re are all assumed to be random samples 
from normally distributed populations of effects having variances 



306 


Analysis of Variance : Complex 

of and ^aB For the fixed values Ar, and {AB)re, 

no assumption as to distribution of effects is required 

When the m scores per cell represent measurement replication, 
will be taken as an estimate of when the m scores per cell 
involve m individuals, will be regarded as an estimate of indi- 
vidual difference variance, designated by It is to be under- 
stood that has 2 components true score variance and error of 
measurement variance 

We are now ready to examine the various possible situations 
involving 2-way classification in order to point out just what is 
being estimated by Once this is done, we 

will be m a position to choose an appropriate variance estimate 
as the denommator, or error, term for F. The question of van- 
ance homogeneity will be discussed after a consideration of 9 sit- 
uations (cases) involving 2-way classification (p. 311). 

Case !• Fixed constants model, with m scores (jn persons) per 
cell, a total of mRC individuals* 


mC r 




22-1 

mR 

C~^l 


±B\ 




The general pnnciple in forming an F ratio is to choose 2 esti- 
mates which differ (in their expected values) by 1 term only, the 
term involving the effect being tested Accordmgly, is the 
correct denommator for Fr, Fc, and Fro, for testing row, column, 
and interaction effects, respectively Note that interaction, if 
present, has nothing whatsoever to do with the main (row and 
column) effects This is true because the interaction is a fixed, 
not a random, effect. If the interaction is significant, one must 
be on guard in drawing conclusions about the main effects — 
qualifications will be needed, as we learned in our discussion 
(pp. 301-303) of the meaning of interaction. 

Case II. Fixed constants model, RC individuals, 1 in each cell: 
For this situation we have no hence no estimate of but 



Choice of Error Term in 2- Way Classification 307 

the other estimates have precisely the same expected values here 
as under Case I, with 7n = 1. It is readily seen that we cannot 
form any F ratios for this situation — ^no significance test is pos- 
sible; hence such an experimental setup should be avoided K 
one can make the a priori assumption of zero interaction, one 
can use s^rc as the error teim for Fr and Fc That such an assump- 
tion may be indefensible is indicated by the fact that significant 
interactions have emerged in about half of the psychological re- 
search studies where interactions were testable. Note, however, 
that if the use of s^rc leads to a significant F for either rows or for 
columns, significance can be safely claimed since the used error 
term will tend to be too laige because of interaction. The real 
danger is that the use of s^rc will too often lead to a false accept- 
ance of the null hypothesis, and hence the overlooking of a real 
effect. 

Case III. Fixed constants model, RC individuals, 1 per cell 
but each is measured m times; 


.> wiC T _ 


o « 




m 


{R - 1)(C - 1) 




We see immediately that this design has exactly the same diffi- 
culties as Case II. The resulting s^u, estimate is useless, if we did 
use 8®u, as the denominator for testing, say s®,, a significant F 
would be meaningless lieeausc wo would not know whether its 
significance was attributable to a real row effect or to real indi- 
vidual differences or to a combination of the 2 
Case rV. Fixed constants model, only 1 person measured m 
times under each of the RC conditions: if we replace by 
in the last set of expected values we will have indicated what each 
^ estunates. As for Case I, the appropriate error term for aU 
3 F’s is but any conclusion one draws from a significant F 
must be carefully scrutinized for meaning It can only mean 



308 Analysis of Variance: Complex 

that the effect holds for the 1 person used in the experiment, with 
no assurance whatsoever that a repetition of the experiment with 
another person, either in the same or in a different laboratory, will 
fleaxl to a confirmation of the results. In other words no general- 
ization is possible except the trivial one that the effect holds for a 
particular individual, useful only in case one’s scientific horizon is 
limited to 1 person 

The foregoing cases just about exhaust the possible situations 
for 2-way classification involving the fixed constants model If 
it has occurred to the reader that each of m cases might be meas- 
ured under all the RC conditions, he should be apprised that this 
would involve 3-way classification, to be discussed later. The 
important thing to have noted is that clear-cut results, permitting 
generalizations to a population of individuals, are possible only 
by the setup of Case I We have listed the other 3 cases because 
it may be helpful to know what not to do 

Case V. Components of variance model, rows stand for R 
individuals and columns stand for, say, C judges, with m (ordi- 
narily m will not exceed 2) ratings by each judge on each indi- 
vidual. The ratings, which must all be directed toward the same 
trait, might be based on observed, or on a transcribed record of, 
behavior of the R mdividuals. (The judges might find it diffi- 
cult to rule out memory when making the ratings ) Instead of 
C judges making ratings we might have C examiners or testers, 
each testing each of the R mdividuals twice on, say, the Rorschach. 
We have a sample of mdividuals and a sample of judges (or exam- 
iners). The expected values of the variance estimates are: 

S^r — > e + ah "t" a 

s^c — > + rn^ah + mRd^b 

^rc ^e + rtlO^ah 
^ e 

It is obvious that can be used as the error term for testmg 
the interactive effect, but smce is nothmg more than an esti- 
mate of error of measurement vanance, the conclusion from a 
significant F is that interaction holds only for these particular R 
mdividuals and C judges— no assurance that repetition of the 
mvestigation with R other individuals and C other judges would 



Choice of Error Term in 2-Way Classification 309 

lead to interaction. As to the main effects, it is obvious that s^re 
becomes the appropnate (and only correct) term to use for Fr 
and Fe- A significant Fr would mean a dependable differentiation 
of individual variation over and above vaiiation due to measure- 
ment error and judge by individual mteiaction, and a significant 
Fc would indicate real variation from judge to judge in a possible 
population of judges 

Case VI. Components of variance model, same as Case V 
except that m = 1. No estimate of is available, but s^rc would 
still be the error term for both F's. 

Remark on components of variance model. Actually one is 
hard put to find good illustrations in psychology for this model 
Aside from the illustration given above, it is difficult to find other 
classifications based on sampling Any student who attempts to 
find other illustrations should keep in mind that it must be pos- 
sible to classify a given score simultaneously m 2 different ways, 
each involving sampling 

Case VII. Mixed model, rows stand for R individuals, columns 
involve C fixed constants (fixed conditions having fixed effects), 
and measurement leplication leading to m scoies per cell. 

+ mC&^a 

S^re — * ~t“ oB 

The iateraction term can be tested by Fre = if Fre is 

significant one concludes that the difiercntiol responses shown by 
these R individuals are larger than expected on the basis of error 
of measurement. Individual by conditions interactions are usually 
foimd to be significant It will be recalled that in the mixed model 
the interaction term, {aB)re, is regarded as a random variable, 
and as such it becomes a source of random variation which, if 
real, will affect both main effects, both the between-rows and the 
between-columns terms. We see from the foregoing that s^re 
becomes the proper error term for testing both s^r and s^c. To 
use s®«, for this purpose is simply not defensible, if, for example, 
IS significant it mi^t be so because of real column differ- 



310 


Analysis of Variance: Complex 

ences or because of real interaction or because of a combination 
of the 2 Ordmanly m this situation is not tested for signifi- 
cance since it reflects individual differences which are always real 
'tinless the measurements are completely unreliable 

Case VIII. Mixed model, same as Case VII except that m = 1 
(no measurement replication) This does not provide an but 
^rc is again the error term foi testing both s^r and s^c The setup 
in Table 44 falls under Case VIII. 

Case IX. Mixed model, R rows stand for R individuals and 
columns stand for C forms of a test (the reliability of measurement 
setup discussed on pp 290-294) 

r — ^ e ”1“ a 

« * mR e « 

^e + - 

C “• 1 


^rc 

It Will be recalled that s^re, which was previously (p. 287) 
labeled a remainder term, was shown to depend solely upon errors 
of measurement under the assumptions usually made in connec- 
tion with test reliability We see now that these assumptions in- 
volve the a priori assumption of no mteraction, an assumption 
which implies, among other thmgs, that possible practice effects 
are not differential from person to person Note that in case mter- 
action is operating, all 3 of the variance estimates in Case IX will 
mvolve interaction in the manner indicated for Case VII ; hence 
s*re is the appropnate error term regardless of whether there is or 
is not interaction Fc = ^c/^rc would be a test of the difference 
between form means or over-all practice effect or both (one 
wouldn’t know which), and would be a test of whether 

reliable discriminations between individuals were being made, in 
spite of mteraction if present 

Remark about measurement replication: We have seen that 
having as an estimate of does not provide us with a useful 
error term (for F) m the testing of hypotheses about Tnn.iTi effects 
(and sometimes about interaction) under any of the 3 mathemat- 
ical models This illustrates a general principle • when an estimate 
of error of measurement variance is used as the denominator of 
F, no generalization to a population of persons is possible, and 



Triple Classification 


311 


hence no generalization of import to science This raises the ques- 
tion as to whether measurement replication is worth while The 
answer is yes, particularly when it is Icnown that a single measure- 
ment is not very reliable. By leplicating measurement we will’ 
obtain more reliable scores in the form of the average of m values; 
hence one source of vanability in the data vnll be reduced The 
student who has not noticed that the analyses involving measure- 
ment replication are, in essence, dealmg with aveiage scores for 
individuals should ponder further 

Homogeneity of variance assumption. For Cases I, II, and 
III it IS assumed that individual diffeience variance is the same 
from cell to cell For Cases IV through IX it is assumed that 
eiror of measurement vaiiance is homogeneous from cell to cell. 
The assumption is te.stable (say, by Bartlett’s test, p 248) only 
for Cases I, IV, V, and VII. 

TRIPLE CI.ASSIFlCA'nON 

Suppose that we wish to arrange an investigation so as to let 
one set of data serve to deteimine whether the vanation of a 
dependent variable is due to or associated with variation on 3 
independent vaiiables. Again, the teim mdependent vanable is 
being used in its broad sense. It might be a “real” vanable like 
illummation, temperature, amoimt of food, length of rest interval; 
or it might be a vanable having to do with qualitative differences, 
such as kind of food, type of motivation or incentive, various 
psychological sets. It makes no difference whether the variables 
are manipulatable in the laboratory, as would be true of all those 
mentioned, or whether the desired variation is secured by appro- 
priate choice of cases 

It IS necessary that we be able to assign individuals or scores to 
each combination of groupings made possible by whatever classi- 
fications wo have on the 3 independent variables Let u» suppose 
that there are C categories on one variable, R on another, and B 
on a third. For purposes of exposition and as a systematic way 
of arran^ng the data, let the C categories define C columns, the 
R categories R rows, and the B categories B blocks. Let Xrbe 
represent the score in the rth row, &th block, and cth column, 
and let us assume for the time being that we have only 1 score 
for each combination Thus X 324 would be the only score in the 



312 Analysis of Variance: Complex 

TdlAe 51. ScoBE and Sxtm Scheua for TbifiiE Classification 




Column 

Sum 

Mean 



1 

C 

C 


Row 

1 

Xiii 

XiLe 

Xuc 

SXiie 

■2^11 

Block 1 

r 

Xrll 

Xrlc 

XrlC 

iXrlc 

^rl 


B 

-X’jBll 

Xric 

Xric 

XXrig 

Xbx 


Sum 

Mean 

ixrn 

Xn 

iXrlc 

X ic 

'SXrlC 

Xia 

XSXrie 

Xi. 

^•1 

Mean block 1 


1 

Xm 

■^Ibe 

X^c 


Zi6 

Block h 

r 

Xril 

Arbc 

Xao 

^Xrhe 

Xrh 


R 

Xsbl 

Xste 

Xsbo 

SZjSbc 

XRh 


Sum 

Mean 

ix^ 

X ti 

XXrie 

Xic 

^XrhC 

X hc 

SSXrt* 

Xt 

Mean block h 


1 

XiBl 

XiBe 

XiBC 

iXiBc 

XtB 

Block B 

r 

XrBl 

XfBe 

XrBC 

^XrBe 

XrB 


B 

Xbbi 

XrBo 

Xrbc 

'^Xrbo 

Xrb- 



Sum 

Mean 

iXrSl 

X SI 

XXrBc 

X Be 

’SXrsa 

X BC 

XZXrBc X B 

X R. Mean block B 

Sums 

1 

SZibi 

SZttc 

h 

SZibc 

^ e ^ 

SZZtte Xi.. 

through 


h 

h 

h 

^ ® =- 

blocks 

r 

SZrtl 

2;Zr5c 

^Xm 

SZXrto X. 


B 

SZjBJi 

SZjsbc 

2Zb6(7 

C — 

XtiXRbe Xr* 


Sum 

r h 

2)SZr51 

r h 

SSZrte 

r h 

XljXrbC 

r b c „ 

XJSXacX .. 

Means for 

1 

1 

Xi c 

Xi c 

Xi . Means for 

rows by 
columns 

r 

Xn 

Xrc 

Xr C 

rows 

Xr 


B 

Xr 1 

Xr e 

Xr c 

Xa. 

Column means 

X 1 

X , 

X C 

X -r 






























313 


Triple Classification 

third row, second block, and fourth column. The scores may be 
anaiiged in some such systematic order as that in Table 61, which 
should be studied carefully by the reader. 

Note in particular how the various sums are specified and their r 

c 

location in the table The first 2 subscnpts in SZue indicate 
that this sum has to do with scores in the first row and first block, 
and that in the summing piocess c takes on values running from 

C 

1 to C. The general expression for all such sums is ZXrbe< The 

r 

symbol SXm stands for the sum of scores in the first column and 
first block; r takes on values of 1 to 22. The corre^onding gen- 

r 

eral symbol is "SXrbc In next to the bottom section of the table 

will be found SXim as the sum for aU the cases in row 1 and 
column 1, the summmg being through blocks, le, b takes on 

values fiom 1 to B. The general expression for such sums is 
6 

SXrSc- The sum of all the scoies in the first block is symbolized 

r c r c 

as SZXriei and in the bth block as "S^Xrbe Tor the sum of all 
the scoies in the first column, irrespective of row and block, we 

rb rb 

have SSXrfti, and the general expression is XSXrbc^ The symbol 

be he 

DUXibe stands for the sum of all scores m the first row, and SSZrSe 
IS the corresponding general expression. Note also how the “dot” 
notation is used to specify the several means. The subsenpt 
which has been replaced by a dot mdicates the direction of the 
addition required to obtain the sum for the given mean. Thus 
in ^ 24 the dot replaces r; this mean is based on E scores, with r 
running from 1 to i? when we sum. The subscnpts which are left 
denote that the mean is for scores in the second block and fourth 
column. The total number of means will be as follows: 

EB means of the form Xrb- 

EC means of the form Xr e 

BC means of the form X^e 

E means of the form ^r- 

B means of the form X-b- 

C means of the form X..e 

One mean of the form X.. = total mean = X 



314 


Analysis of Variance: Complex 

Perhaps a better appreciation of the meaning of all these means 
can be obtained by a study of Fig. 19, which pictures geometrically 
the situation for 2 blocks, 3 rows, and 4 columns The maniJual 
scores can be thought of as m the cubicles of a 2 by 3 by 4 box. 
Summing through the box in the vertical direction leads to the 



Fig 19 Geonietiic picliiie ot 3-way classification 


8 means on the top; summing in the forward-backward direction 
leads to the 12 means on the front surface, and summing through 
rightrleftward leads to the 6 means on the side Summing the 
means (or summing sums) across the front leads to the means 
placed along the vertical axis for the groups defined by the rows, 
summing the means (or sums) downward on the front leads to 
the means placed along the right-left axis for the groups defined 
by the columns, summing down on the side leads to the means 
along the third axis for the groups defined by the blocks To 




Triple ClassilScation 315 

get any of these means it is, of course, assumed that the sum in- 
volved is divided by the proper number 

Of primary interest is the question: Is the variation among the 
means along the edges, consideied separately, larger than expected • 
on the basis of chance? To answer this we need to break down the 
sum of squares of deviations from the total mean into appropnate 
components The scoie Xrhc in the cubicle defined by the rth 
row, 6th block, and cth column will vary more or less from X, 
and 3 possible sources of variation for Xrbc are obvious: the devia- 
tion of its row mean, its column mean, and its block mean from 
X. Now, if we recall the situation for double classification, it is 
fairly obvious that, when the score Xrhc is considered as belonging 
in row r and column c, one source of vanation becomes the remain- 
der or interaction for rows and columns, considered next as also 
falling in row r and block 6, another source of variation is the 
possible in tei action of rows and blocks; and then thought of as 
belonging to column c and block 6, the score also involves the 
interaction of columns and blocks 
When the sums of squares for these 6 components are added, 
it will be discovered that they do not sum to the sum of squares 
for the total; ie , subtracting these 6 sums from the total sum 
leaves a remainder This residual is sometimes referred to as 
error, more frequently as a tri'ple interaction This term involves 
rows, blocks, and columns The reader, having in mind the idea 
that the simple row by column interaction has to do with the 
possible failure of cell entries to be consistent with the 2 sets of 
marginal means, must now try imagining that the RBC entries 
in the cubical cells of our box may not be entirely consistent with 
the 3 sets of means on the edges and with the 3 sets on the surface 
We have seen that a statistical check on simple interaction is not 
possible with only 1 entry per cell, similarly more than 1 score 
per cubicle is required foi testing triple interaction 
Table 52 gives the essentials, in symbols, for the analysis of 
vanance for the triple classification setup In order to specify 
the interactions, we here adopt the abbreviation scheme generally 
used. Thus iJ X B, read R by B, indicates the row and block 
interaction, and R X B XC stands for the row by block by col- 
umn or triple interaction. In a given investigation, the rows, 
blocks, and columns refer to particular independent or classifica- 
tory variables. 



316 


Analysis of Variance: Complex 


Table 62 Vabiancb Table fob Triple Classification into R Rows, 
B Blocks, and C Columns 


Source 

Sum of Squares 

df 

Vanance 

Estimate 


Bcii^r -r)* 

R-1 


Blocks 

Ecscr t -Tf 

B - 1 


Columns 

Rsh^ o-r)® 

C-1 

s®« 

BKB inter- 
action 

csscrrt -Tr -r & h-d® 



RXC inter- 
action 

e^(Xtc-^t -r..c-br)® 

(B-lXC-l) 

s\c 

BXC inter- 
action 


(B-1)(C-1) 

sSc 

RXBXC 01 
tiiple 

interaction 

-Sr cS Ic 
+Xr -|-:S:5-t--Y c-A)® 

(B-DiB-lXC-1) 

S“ri>c 

Total 

sls(Zr5«-r)® 

RBC-l 

i 


It will be noted in Table 52 that the df for the triple interaction 
term is ^ven as (R — 1)(B — 1)(C — 1). The student may be 
helped m understandmg the reasomng which leads to this df by 
referring again to Fig. 19 The surface means tend to restrict 
the deviation score values within the box. How many cubical 
cells can we fill before these restnctions operate? The general 
rule-of-thumb procedure for detenninmg the df for interaction 
sums of squares is to take the pioduct of the d/’s of the variables 
mvolved in the given interaction. This holds for simple, triple, 
and higher-order mteractions. 

SraClAL CASE \mERE THE ROWS STAND FOR 
PERSONS OR MATCHED INDIVIDUALS 

Suppose the purpose of a study is to ascertain whether varia- 
ticm on a dependent variable is influenced by or associated with 
variation on 2 mdependent variables. This, of course, involves 











Computational Illustration for Triple Classification 317 

the double classification idea previously discussed, but we are 
now in a position to accompli^, by means of tnple classification, 

2 closely related tilings which could not be done by the simpler 
double classification scheme 

1. If transfer, practice, fatigue, etc , effects are such that it is 
permissible to make observations on an mdividual under each of 
the RC combinations of conditions, we may increase the precision 
of an experiment by using only m individuals instead of niRC 
mdividuals, as in the illustration involving pursuit leammg Or 
we may make obser^^ations on mRC cases so as to have in each of 
the RC cells m scores which are based on m sets of matched indi- 
viduals, thereby reducing error. 

2. If we are dealing with a situation in which it is required that 
observations be made on the same individual in each of the RC 
conditions, and if more than 1 case is used either to reduce errors 
or to provide a basis for generalizing to a population, it is neces- 
sary that w'e make statistical allowance for the fact that the RC 
observations on the m cases are nonindependent, or correlated. 
This allowance was not possible by the double classification 
scheme, for which it was assumed that the m scores in 1 cell were 
independent of the observations in the other cells. 

It will be recalled that in the double classification setup, by 
letting 1 classification refer to R mdividuals or sets of matched 
cases, we were provided with an over-all test of significance for 
several correlated means for groups classified on a single independ- 
ent variable Triple classification permits a similar test of corre- 
lated means for groups involved in double classification. 

Since the assigning of the bases of classification to rows, blocks, 
and columns is arbitrary, we shall let the R rows stand for R indi- 
viduals (or R matched persons), with the blocks and columns 
representing the independent variables to be investigated 

COMPUTATIONAL ILLUSTRATION FOR TRIPLE 
CLASSIFICATION 

The task of computing the required sums of squares (see Table 
52) is tedious. The first step is to arrange the data in some such 
systematic order as that depicted in Table 51 and do the neces- 
sary adding to secure the various sums indicated in that table. 
The total sum of squares for all RBC cases is obtained as usual: 



318 


Analysis of Variance: Complex 


sum all the scores, sum all the squared scores, and substitute in 
the general formula {1/RBC)[RBC'^X^ — (SJf)^] 

To secure the 3 between-groups and the 3 simple interaction 
nsums of squares, we form 3 subtables involving sums taken in 
various directions For the first of these subtables we take row 
by column sums obtained by adding cell entries from block to 
block, i e , through the B blocks The next to the bottom section 
of Table 51 contains these row by column sums, which we repro- 
duce here as Table 53a The reader will note that the values for 
Table 536 are the nght-hand margin sums of Table 51, and that 
the values for Table 53c are found as the sums m Table 51 along 
the bottom of each block. 

With these auxiliary tables m mind, we can write the required 
computational formulas The simple interaction terms are secured 
by computing a subtotal sum of squares for each table and then 
subtracting therefrom the 2 appropriate “between” sums of 
squares. These subtotal sums of squares will not be the same as 
the total sum of squares obtained for double classification by 
formula (102a) because we are now dealing with cell entries which 
are the sums of scores rather than single scores. Due allowance 
for this can be made by a slight change in formula (102a). The 
amended formula, with notation appropriate for and specific to 
the 3 auxiliary tables, may be written as follows: 

Subtotal: row by column 


RBC 

Subtotal: row by block 


^ [ficii(lx,6e)^ - (isiXrftc)'*] (106a) 


- (sliz,6c)2] (106b) 

Subtotal: block by coluiim 

^ [Bcii(iXric)^ - (ilsx,5c)2] (io 6 c) 

From Hxe rigbt-haud margm of either Table 53a or 53b we caa 
compute the sum of squares for 


Between rows: 


1 

BBC 


[RiiiiXric? - (sssZrfc,)^] (io6(i) 



Computational Illustration for Triple Classification 319 
Table 5 Sa Required Sums for Row by Column Analysis 



























320 


Aiiialysis of Variance: Complex 

From the bottom of either Table 53a or 53c we can obtain the 
sum of squares for 

„ Between columns: — — [CS(SSXric)^ — (SSS^ric)®] (106e) 
RBC 

From the bottom of Table 535 or from the right-hand maj^ of 
53c we can calculate the sum of squares for 

Between blocks. [BS(SSX,j«)2 - (sisZ^c)®] (106f) 

RBC 

Then from the above 6 sums of squares the simple interaction 
sums of squares may be secured by the following subtractions: 

Row by column interaction* (106a) — (106d) — (106e) (107a) 

Row by block interaction: (1066) — (106d) — (106/) (1076) 

Block by column interaction: (106c) — (106e) — (106/) (107c) 

And finally, again by subtraction, we have the sum of squares 
for the row by column by block, or 

Triple interaction: Total sum of squares minus (106de/) 
minus (107a6c) 

We will illustrate the procedure by using the data of Table 54, 
in which the blocks represent 2 levels of illumination, the columns 
3 degrees of albedo, and the rows 4 individuals, and the scores 
are judged whiteness Notice that each subject made judgments 
under all 6 of the combinations of conditions The sums given 
in Table 54 become the entries for the auxiliary computational 

r h c 

Tables 55a&c. The needed value of SSSZrtc is 898, and the sum 

The 

of all the squared scores, SSSZ®,sc, is 44,394. From these figures 
we have 

■5j*j-[24(44,394) — (898)®] = 10,793 83 = total sum of squares 

The various “between” sums can readily be obtained by addmg 
the squares of the appropnate marginal sums of amdhaiy Tables 
55dbc, and substituting m formulas (106d^. 

For between blocks we need (414)® -f- (484)® = 405,652; 

For between columns we need (152)® + (247)® -f- (499)® 
= 333,114; 



Computational Illustration for Triple Classification 321 

For between rows we need (198)^ + (202)^ + (197)^ + (301)^ 
= 209,418. 


Tdble 54 - Used in Illustrating Computations for 3-Wat 

Classification 2 Levels op Illumination (Blocks), 

3 Albedos (Columns), and 4 Observers (Rows) * 


Illummation 


Albedo 

Sum 

Mean 

Observer 

07 

.14 

26 





1 

11 

24 

60 

95 

31 67 



22 

26 

44 

92 

30 67 

1.20 


1« 

22 

55 

93 

31 00 


20 

32 

82 

134 

44 67 


Sum 


104 

241 

414 

34 50 


M<*iin 


26 00 

60 25 

34 50 




U 

24 

65 

103 

34 33 



27 

3b 

47 

no 

36 67 



18 

24 

62 

104 

34 67 

2 00 


24 

59 

84 

167 

55 67 


Sum 

83 

143 

258 

484 

40 33 


Mean 

20 75 

35 75 

64 50 

40 33 



1 

25 

48 

125 

198 

33 00 



49 

62 

91 

202 

33 67 

Sums through 


34 

46 

117 

197 

32 83 

blocks 


44 

91 

166 

301 

50 17 


Sum 

152 

247 

499 

898 

37 42 

Means for 
rows by 
columns 

1 

12 50 

24 00 

62 50 

33.00 



24 50 

31.00 

45 50 

33.67 



17 00 

23.00 

58 50 

32 83 


■1 

22 00 

45.50 

83 00 

50.17 


Column means 

19 00 

30 87 

62 38 

37 42 


Data from R. E Taubman, J Exp. Psychol , 1945, 35, 235-241. 

































Analysis of Variance: Complex 
Tabu 55a Requihbd Sums for Block by Column Analysis 


Illumination 

Albedo 

Sum 

07 

14 

26 

1 20 

69 

104 

241 

414 

2 00 

83 

143 

258 

484 

Sum 

152 

247 

499 

898 


TabU55h Required Sums for Row by Block Analysis 




Individuals 








Sum 

Illumination 






1 

2 

3 

4 


1.20 

95 

92 

93 

134 

414 

2 00 

103 

110 

104 

167 

484 

Sum 

198 

202 

197 

301 

898 


Tabu 55c Required Sums for Row by Column Analysis 


Individual 

Albedo 

Sum 

07 

14 

.26 

1 

25 

48 

125 

198 

2 

49 

62 

91 

202 

3 

34 

46 

117 

197 

4 

44 

91 

166 

301 

Sum 

152 

247 

499 

898 






Computational Illustration for Triple Classification 323 
Then we have 

•^[2(405,652) — (898)^] = 204 17 for between-blocks sum of 

squares 

■^[3(333,114) — (898)^] = 8039 08 for between-columns sum 

of squares 

■^[4(209,418) -■ (898)^] = 1302 83 for between-rows sum of 

squares 

In order to secure the subtotal sums of squares we add the 
squares of the cell entnes in the auxiliary tables For the block 
by column subtotal we have from Table 55a 

m? + ( 83)2 (104)2 ^ (143)2 4. (241)2 4. ( 253)2 ^ 167,560 

Similarly for the row by block subtotal we have from Table 556: 

(95)2 4. ( 103)2 + . . .+ (167)2 ^ 105 508 

and for the row by column subtotal we have from Table 55 c: 

(25)2 + . + (44)2 + . . . + (166)2 = 87,814 

These 3 sums can now be substituted into formulas (106a6c) : 

^[6(167,560) - (898)2] ^ g289 83 = block by column subtotal 

sum of squares 

•5*3;-[8(105,508) — (898)2] _ 1550 ^ block subtotal sum 

of squares 

•^[12(87,814) — (898)2] _ 10 , 306.83 = row by column subtotal 

sum of squares 

Next we get the simple interaction sum of squares by the sub- 
tractions indicated m formulas (107a6c) 

8289 83 - 204 17 - 8039 08 = 46.58 = block by column 

interaction 

1569.17 - 204.17 - 1302.83 = 62.17 = row by block in- 
teraction 

10,306.83 “ 8039.08 — 1302 83 = 964.92 = row by column 

interaction 



324 Analysis of Variance: Complex 

Then for the triple intei action sum of squares we have 

10,793.83 - 204.17 - 8039.08 - 1302.83 

- 46 58 - 62 17 - 964.92 = 174.08 

The several sums of squares, their dfs, and the lesulting vari- 
ance estimates are brought together in Table 56. 

Table 56. Analysis op Vaeiance for Judged Whiteness^ by 4 Observers 
FOR 3 Degrees of Albedo and 2 Le\bls of Iliumin^^tion 


Source 

Sum of 
Squares 

a 

Variance 

Estimate 

Ulummation 

204 17 

1 

204.17 

Albedo 

8,039 08 

2 

4,019.64 

Subjects (individual differences) i 

1,302 83 

3 

434.28 

Interaction. / X A 

46 58 

2 

23.29 

Interaction 7 X 

62 17 

3 

20.72 

Interaction* A'XS 

964 92 

6 

160.82 

Interaction — ^tnple 7 X A X ^ 

174 08 

6 

29.01 

Total 

10,793.83 

23 



We are not yet ready to discuss the principles controlling the 
choice of the error term appropriate for the possible F^b. When the 
models have been presented, the student may check back to see 
whether we have used, in the next 2 paragraphs, the correct de- 
nominator for the F ratio. 

First we use the tnple mteraction as a basis for testing the 
sigmficance of the simple interactions Of chief interest in this 
example is the possible interaction between albedo and illumina- 
tion, but since this mteraction variance is less than that for triple 
interaction, we know at once without computmg F that the inter- 
action is insignificant. The illumination by individual interaction 
is also insignificant. The interaction of albedo with individuals 
yields an F of 160 82/29.01 = 5 54, which, for = 6 and ^2 = 0, 
fails between the values of 4.28 and 8.47 for the 05 and .01 levels 
respectively. This F of 5 54 is high enough to suggest that the 
form of the relationship between judged whiteness and albedo 
varies somewhat from person to person 














Triple Classification with m Cases per Cubicle 325 

Now we turn to a test of the main effects A test of the sig- 
nificance of row differences is a test of individual differences 
and is accordingly ot little inteiest. For illumination we have 
F — 204 17/20.72 = 9.85, which falls beyond the 6.99 required 
for P - 05, and is therefore suggestive of a real difference due to 
illumination For albedo we have F — 4019 54/160.82 = 24 99, 
which is highly signifacant 

Actually, the foregoing results are not to be regarded as con- 
clusive. The data which we have used to illustrate the computa- 
tions are only a part of more complete data which involved addi- 
tional degrees of albedo and other levels of lUummation Partly 
because of space limitations and partly because it is easier to 
illustrate the computations when only a few rows, columns, and 
blocks are involved, we have ignored a part of the available data 

It should be kept in mind that this illustration is an example 
of the use of the triple classification scheme as a method for mak- 
mg allowance for the use of correlated observations in a problem 
of double classification involving the influence of 2 variables 
on a third. In this special use of triple classification, in which the 
rows conespond to individuals, the objective is identical with that 
in the earlier analysis of pursuit rotor learning (Table 49). The 2 
situations are similar in that there are m (or R) scores in each 
cell; they are different in that the m scores in any one cell for the 
pursuit learning problem are independent of the m scores in other 
cells, whereas the R scores in each of the albedo-illumination cells 
are correlated — each peison contributes a score to each cell Both 
schemes permit a check on the inteiaction effect of the 2 independ- 
ent variables used to classify the obseivations The use of BC 
observations on each of R cases (if feasible) will yield more precise 
information than obtainable by having scores for m individuals 
in each of the BC cells. This is analogous to the well-known 
pnnciple that e.xpenmentation in which individuals serve as their 
own controls tends to be more precise than that in which an inde- 
pendent control group is set up. 

TOIPLE CLASSIFICATION WITH m CASES PER CUBICLE 

We have seen how the possible sissociation of a dependent 
variable with 3 independent variables can be tested by a vsuiance 
analysis made on a triple classification basis. If one wishes either 
to base his results on more than RBC observations or to test the 



326 Analysis of Variance: Complex 

significance of the tnple interaction, it is necessary to have more 
than 1 score in each cubicle. This can be accomphshed either by 
assigning m individuals to each of the RBC combinations of con- 
ditions or by usmg just m individuals with each snelding an observar 
tion under all the BBC conditions or by usng m sets of RBC cases 
with 1 individual of each set assigned to each of the RBC groups. 
Matching may not be feasible, neither may the securing of RBC 
observations on each of m mdividuals be feasible. At times, how- 
ever, the problem under consideration may require an observation 
on each individual under all the conditions Whether m mdividuals 
are so used by pieference or by necessity, we will have m measure- 
ments in each of the BBC cubicles, but m testing the significance of 
the differences between the means of rows or of columns or of 
blocks we will be deahng with a situation m which the means are 
correlated because they are based upon the mme mdividuals. To 
allow for this fact we would need a quadruple classification setup. 

Let us next consider the case in which we have in each cubicle 
m scores, which are independent of the m scores in other cubicles. 
The total number of scores will, of course, be mRBC, and the 
breakdown of the total sum of squares will include the components 
specified m Table 52 plus a within-cubidles sum of squares. Since 
each cubicle defines a gioup, the within-cubicles sum of squares 
does not differ from previously discussed “withm” sums of squares. 
The formula in this case is 

in which it is understood that the term contains mRBC 
squares and that the subtractive term indicates that we first sum 
the m scores separately for each cubicle, then square each of these 
sums, and finally sum all these RBC squared sums The df for 
this term will be mRBC — RBC because we are dealing with the 
deviations of mRBC scores about RBC different nftAnne 
With TO independent scores per cubicle, the 6 computational 
formulas (106) need only be modified by the use of 1/mRBC in- 
stead of 1/RBC as the factor outside the brackets. It must be 
imderstood, however, that the sums withm the parentheses of 
formulas (106) will involve m tunes as many scores as for the 
simpler situation with 1 case per cubicle. The computation is 



Choice of Error Term in 3-Way Classification 327 

again accomplished by auxiliary tables, the main cell entries of 
which Vrill, of course, also involve sums with m times as many 
scores If we think of the orderly arrangement of the on^al 
data, as exemplified in Table 51, it wiU be seen that each cell m 
the separate block designations will consist of m score entries; 

1 e , we will have m acoies of the type Xm or X 324 A more pre- 
cise notation would be to let X,r 6 f stand for the score of the ith 
person in the rth row and cth column of the 6 th block, with i 
taking on values of 1 , 2 , • • - m 

Except for the use of 1/mRBC m place of 1/RBC in formulas 
(106), the computation of the betw’een and simple interaction 
sums of squares follows exactly the steps outlined for a sin^e 
score per cubicle. The triple interaction sum of squares is again 
obtained by subtraction, but now we must also deduct the within- 
cubicles sum of squares Note that in the formula of Table 52 
which defines the triple interaction term we need to replace Xrbe 
by H^rbf) the mean of the m scores in the rth row and cth column 
of block 6 

CHOICE OF ERROR TERM IN 3-WAY CLASSIFICATION 

The general mathematical model for the breakdown of a score in 
the tnple classification setup may be written as 

(Xrbek — 

= ttr -f- Sft + 7e + («5)r6 + (aT)rc + (.^)be + («57)rtc + Crbek 

m which the subscnpts, r, b, and c refer to rows, blocks, and col- 
umns, and k takes on values 1 • ■ • wi, there being m independent 
replications (either of measurement or of individuals) in each cell. 
The mean value of each term on the right of the equality sign is 
zero, that is, all values are expressed in deviation units Note the 
manner in which the interactive effects are designated. Using 
notation analogous to that employed in specifying equations (105) 
from equation (104) for 2-way classification, we may replace a, 5, 
and 7 by their Latm equivalents, with capitals A, D, and G repre- 
sentmg fixed values (fixed constants model), and with lower-case 
letters a, d, and g standing for classifications involving samplings 
(components of variance model) The mixed model would, of 



328 


Analysis of Variance: Complex 


course, coutaiii 1 of the lower case and 2 of the capital letters or 2 
of the lower case and 1 of the capital letters. 

Eather than rewrite the model equation with particular Latin 
letters specifyii^ the particular models, we can indicate the models 
by the following symbols. 

{ADG\ for fixed constants model 

{adg\ for components of variance model 

[oDCt] and [adG\ for mixed models 


It is assumed that the Or, ge, (.ad)rb, iaD)rb, (<m)re, (oG‘)rc» 
{dg)be, (.dCf)be, ifldg)rbe, (.ad(jf)rbe, laDG)rbc, and erbei are random 
samples from normally distributed populations of effects having the 
respective variances* 6^ a, ^d, ^g 7 ^ad, ^aD,*^ag, ^ao, ^dg, ^da, 
^adg, ^oda, ^aDO, and when k — 1 • ■ m represents measure- 
ment replication or when fc = 1 • • • m involves replication of 
individuals Seldom does one have an opportunity to check on the 
normality of the several interactive effects — a fact which may be 
disturbmg to the reader No such assumptions are made regardmg 
the effects Dj, Ge, (AD),t, (A.G)re, (DG)be, and (ADG)rbe, which 
are associated with the fixed constants. Smce all effects are ex- 
pressed in terms of deviation umts, the sum of each particular set 
of effects, such as a, or Ar or (aD)rb or (PG)bc, is zero. 

In order to choose the appropriate variance estimate for the de- 
nominator of F for a given significance test, we again need to 
indicate just what each possible variance estimate (s®) estimates. 
A summary statement will be given later regarding the assumption 
of homogeneity of variance for the several cases mvolving 3-way 
classification (p. 335) 

Case X. Rxed constants model [ADG\, with m <Merent 
individuals in each of the B.BC cubicles This is a simple straight- 
forward case in which s*« — »■ and all the other 7 s® values are 
estimates of plus a sm^e (possible) effect, the one to be tested. 
Examples: 






mBC 

R-1 


SA® 


r 


and 


«®rt 




mJC 

(R - 1){B - 1) 


si(AP)®,6 



Choice of Error Term in S-Way Qassification 329 

Thus s®w is the proper error term for testing all 3 mam effects, all 
three 2-way interactions, and the 3-way interaction Generaliza- 
tions are to the popiilation(s) from which the mRBC persons were 
drawm, but conclusionh regaiding main effects, or factors (the a, ,5, 
and 7 are often spoken of factors), will need to be quahhed in case 
a given factor is involved in a signihcant interaction 

Case XI. Fixed constants model [AD(7], with 1 individual 
(measured once) in each of the RBC cubicles, a total of RBC 
persons. This design yields no s^u, by which to estimate which, 
as for Case X, is involved in the expected value of each of the other 
vanance estimates; hence no tests of significance are possible unless 
one can make (and defend) the a priori assumption that the 3-way 
interaction is zero If so, ^rbe would become the error term for 
testing the main and the 2-way interaction eflects. A defensible a 
priori assumption that any 1 of the 2-way interactions is zero 
would also provide the desired estimate of for testing the mnin 
effects and the other interactions We lepeat what was said in the 
discussion of the error term for the double classification setup 
significant interactions are so prevalent 'in psychology that the a 
priori luaumption of a zero interaction needs to be backed up by 
veiy strong logic It should be noted that if tlie use of ^rbe (with- 
out justification of the assumption of zero 3-way interaction) leads 
to a sigmficant F, we can be sure of its significance smee this as an 
error term will be too large in case of nonzero 3-way mteraction 
(compare with analogue in double classification, p 307). What 
about the risk of a type II error? 

Case XII. Fixed constants model [ADG\, 1 person pei cubicle 
but each person is measured m times. This leads to an which 
IS an estimate of rather than the needed estimate of Now it 
migjht be thought that this s^y, could lie used to test s^rir for the 
presence of 3-way interaction, but note that since + 

fft r h c 

(E"- ' r) T i f - IKC”) — di- 
vision of s®r6e l>y Icads to a noninterpretable F (if sigmficant) 
because one has no way of knowing whether the significance is due 
to individual differences or to 3-way interaction (remember that 
contains an error of measurement part) Stated differently, 
the ^Tbe is an esthnate in which error of measurement variance, 
true individual difference variance, and posrible 3-way interaction 
effects are all confounded, a term used to indicate that a ^ven setup 



330 


Analysis of Variance: Complex 

does not allow a disentangling of the sources of variation which enter 
into a particular estimate By using the score in each cubicle as 
the average of the m measurements, one can handle Case XII in the 
nsanner indicated for Case XI The same difficulties are en- 
countered — ^the only advantage of Case XII over Case XI is that 
the scores, being averages, are more reliable 
Case XIII. Fixed constants model [ADG\, with only 1 person 
supplying all scores, or a score (or scores) under each of the RBC 
conditions If we have m measures on the 1 person under each of 
the possible combinations of conditions, and each of 

the other 7 variance estimates has an expected value including <7^e 
plus an effect A significant F with as the error term permits 
only the conclusion that repetition of the experiment on this same 
person would be expected to yield similar results — a “generalizar- 
tion’’ which has no generality, and hence is worthless If the 1 
person provides only 1 score per cubicle, we won’t even have an 
estimate of hence we need to make a pi ion assumptions about 
mteractions (as for Case XI) in order to “generalize” to this 1 
person Thus Case XIII as a possible experimental design holds 
no promise. 

Case XIV. Mixed model [aDG\ Typically, this will involve 
R individuals assigned to the rows with each measured at least 
once under the BC conditions. We have (with no measurement 
replication) . 

^ aDO + aD + oG 4" BC6^ a 

RC 6 

5^6 ^ oDG + aD + -T 

B — 1 
RB e 

+ ^aDG + BS^aG + SCr^c 

(7 — 1 

S*r5 + ^oDO + 

SS(DG0S, 

^ rhe + ^oDG 



Choice of Error Term in 3-Way Qassification 331 

Scrutiny of the foregoing expected values indicates that Js 
appropnate for testing all thiee 2-way interactions, that s^e 
diould be tested against s®re, and s^i, against s®r6. No test of s*r is 
possible, but this is not serious since it would only be a test of the 
significance of individual diffeiences 

If we had replication of measurements (each person measured m 
times under each of the BC conditions), we would have an as 
an estimate of which would peimit a test of the 3-way inter- 
action effect. If Frbe "vvere significant it would only mean that 
3-way interaction is, for our sample of R persons, greater than 
expected on the basis of errors of measurement. As usual (when 
an estimate of d^e is used as the error term), no generalization to a 
population of individuals is possible. It is a fact, however, that 
3-way interactions involving individuals are usually significant, 
hence having measurement replication does not change the pro- 
cedure, for chuosdng the error term, from that depicted in the just 
previous paragraph. 

Case XV. Mixed model [adG\, with 1 score per cell. Situations 
calling for this model in psychology are not plentiful Suppose R 
children are observed under C different social situations by B 
observers, each of whom rates (on a 10-point scale) each child for a 
particular aspect of behavior, e g., social participation Pnmary 
interest would be in the effect of conditions (the Ge effects) with 
secondary interest in observer bias (the raters bemg regarded as a 
sample of observers having “effects”) and possible interest m 
2-way interaction effects. For model [adG\ the meanii^ of the 
several variance estimates is, aside from a common term, as 
follows: 

s*r -» 

s*j —* d^tuta H" ad “h Bd^ dG + RCd^ d 

AO .» A RB ® ^ 

S*C + Bd^ aO + dO + T SG® e 

C "" 1 

—* ^adO + Gtf * ad 
d^adO + Bci^oa 
—* ^ttdo + B^ da 
^rbe ^ adO 



332 


Analysis of Variance; Complex 


Obviously, the appropriate error term for testing all three 2-way 
interactions is s^rbc, but trouble is encounteied in finding an error 
term for testing the mam effects Keepmg m mind that a test of 
s% IS of trivial importance, we note that if the {dG)bc interaction 
effect were zero, could be tested against s^rh and against 
s^rc, or if the (ad)rb interaction were zeio, s^he could be used to test 
s^hj and if mteraction (aG)rc were zero, could be tested by using 
s^he One can scarcely make a priori the assumption that any of 
these 2-way interactions is zero, m fact, the safest presumption is 
that none of the 3 is zero It is frequently asserted that the failure 
of a 2-way mteraction to be significant when tested against s^rhc 
can be used to justify the assumption of a zero interaction, but 
failure to be significant means only that it could be zero. Further- 
more, if R and B are small an interaction would need be sizable to 
be detected. This issue, along with a similar one, will be discussed 
later under the heading 'Treliminary tests ” Suffice it to say now 
that model [ad6\ is not recommended. 

Case XVI. Components of variance model [adg]. If anyone 
finds a situation in which all 3 bases of classification involve 
sampling, he will need to know that for model [cdg] the variance 
estimates have expected values specifiable from those of model 
[adCf\ by replacing 0 with g except for the fourth term of s^c, which 
becomes RBS^g The 2-way interactions can be tested against 
^^rb€, but there is no way of testing the main effects without 
making precisely the same assumptions regarding 2-way inter- 
actions that were mdicated for Case XV. 

Case XVII. Mixed model [oDG], but a pseudo 3-way classificar- 
tion. Suppose a sample of R individuals in block 1, a sample of R 
different individuals in block 2, and so on. The B blocks represent 
B experimental conditions, the effects of which are to be deter- 
mined, and at the same time the C columns stand for another factor 
which is also to be evaluated The B sets of 22 individuals are used 
because it is not feasible to use each person under each block con- 
dition Or suppose the blocks stand for different groups (say, 
diagnostic) from each of which R cases are drawn at random. We 
wish to compare the groups and also the C conditions. 

Let us re-examine Table 51 m order to determine how to set up 
the model for this situation We first note that for Case XIV the 
variation among the row means (Xr ) contributes to s^r as an 
estimate of individual difference variation, whereas for Case XVTI 



Choice of Error Term in 3-Way Classification 333 

each of these row mciins is an average for B diffeient individuals, 
hence row means do not hold for individuals We do, however, 
have indivi<lual dilT<Tence variation within each block, as repre- 
sented by means ot the typo A',/, (right-hand part of Table 5>) 
Accordingly, we can anticipate a sum of squares for individual 
diffeiences which will involve combining the sum of squares within 

r U 

each bltxik; i.e., C2SS(A'rf, — X h )", with RB — B degrees of 
fieedom The icsulting laiiance estimate may be labeled 
for individual differences 

In ordinary 3-way classihcation (Case XIV) the B sets of TriRg.T>g 
of the type Xrb have to do with row (indiv’idual) by block inter- 
action, an interaction which reflects the failure of the individuals to 
maintain similar score positions from block to bliK'k But with 
indeiH'ndent cases in each block, no block by row^ intei action is 
possible; a person can’t leact differently from 1 block to another 
unless he has been measuietl under more than 1 block condition 
Consider next the Xr , type of mean at the bottom of Table 51 
These means ordiimnlv entei into row by column mteiaction, but 
in the present cast* each til tlu'st* means is the avciage for B different 
individuals who just hapiient'd to have been assigned the same row 
number Therefore, t heie can lie no row by column interaction in 
the usual sense. We have, nevertheless, RB independent individ- 
uals in a total of RB (instead of R) rows; hence there could be a 
meaningful individual by column interactive effect (not testable 
with 1 score per cell, but present as a source of variation). 

What of a possible 3-way interaction involving rows, blocks, and 
columns^ This does not make sense since an individual can in no 
way react inconsistently from 1 block condition to another without 
having been subjected to different blo<*k conditions 

With the forngoing m mind, we may write the following specific 
model for Cast* XVII. 

(Xrbe ~ X) ai + Di + Ge+ (DG)te + Arj, 

in which a, indicates individual difference effects and hrbe is the 
lemainder after the first 4 parts have been subtracted from 
{Xrhe — X). The several sums of squares and their d/’s are given 
in Table 57. Note how the first Ime differs from the first Ime of 
Table 52; note also the similarity of the remainder sum of squares 
to the reminder (or last) term in equation (101), p. 286, and to 



334 


Analysis of Variance: Complex 


Table 57 Modification of Variance Table 52 for Case XVII. 
R Different and Independent Individuals in E\ch Block 


Source 

Sum of Squares 

df 

Variance 

Estimate 

Individuals * 

c:S2(.Trt --?»)• 

RB-B 


Blocks 

flCi'c-T 6 - X)* 

B - 1 

8*6 

Columns 

RB2(Y c - .Y)* 

C - 1 

s*« 

B X C inter- 

be 

BSScYfte-Yft -Y e + Y)* 

(B- DCC- 1) 

S®6e 

action 

Remainder 

T be 

ss^iXtic - Xrt - X j, + r ' )* 

B(B - 1)(C - 1) 


Total 

TUt 

S22(Xrt, - X)* 

RBC - 1 



I roe 

*The sum o! squares for indniduals is computed by substituting m ~ — 

b r c 

the row by column interaction term in Table 46 Actually, the 
remainder in Table 57 involves possible individual by column 
interaction, composed of ordinary row by column interaction 
within each block, then summed over blocks. 

The expected values of the several variance estimates are as 
follows (recall that contains as a component) : 

5^1 — > oG 

^ ^ aQ + -r r 2-0^6 

— 1 

C — 1 

- A + (iWV 

From these values we see at a glance that is the error term for 
testing s®6, a test which is analogous to s®6/s®a, in the 1-way 
clasdfication setup for the difference between the mAana of in- 






335 


Preliminary Tests and Pooling 

dependent groups. For testing s®, the remainder estimate, is 
appropriate Since is, in part, an estimate of individual by 
column interaction, w find an analogue in the 2-factoi setup (Case 
VII, p 309) for which a row by column interaction provides the 
correct variance estimate for testing column effects when the 
column means arc correlated (l)a.sed on the same individuals) 
The remaimiei \anance e.stimate is also appropriate for testmg 
the 15 X C interaction This interaction has a special mAaT^infr 
when B stands for different groups and C stands for C tests all 
scored in comparable standard score form The column meanR for 
each block are the basis for a given gioup’s piofile, hence a test of 
the B XC intei action tells us whether there are significant 
differences among the profiles for the B groups 
Caution: Case XVII as here outlined calls for the same number 
of individuals per blwk (or gioup) 

Assumption of homogeneity of variance. Cases X, XI, and 
XII require similar individual difference variance for all cubicles, 
but only Case X jicrmits a test of the assumption For Cases 
XIII, XIV, XV, and XVI it is assumed that error of measurement 
variance is the same from cubicle to cubicle The assumption for 
these cases is not teatahic unless one has measurement replication 
with m scores per cubicle. Case XVII assumes that the row 
variance within blacks is homogeneous fiom block to block when 
uang s®, to test s®6, and that the row by column interaction within 
blocks is amilar from block to block when testing either s®e or S^6c 
against Both of these assumptions are testable since the re- 
quired within-block estimates can be computed. 

PRELIMINARY TESTS AND POOLING 

When we discussed Case XV, we found that certain effects could 
not be tested without assuming that an interaction is zero. The 
temptation is to assume an interaction is zero if it fails to be 
signihcant when tested against an appropriate error term. The 
writers of textbooks on mathematical statistics are remarkably 
mum on this point, presumably because the situation gets too 
“iffy” : a main effect is significant tf it reaches, say, the 05 level, and 
tf a certam interaction was not significant at a specified level. 
Under such circumstances a P for an effect ceases to have the same 
meaning as when unencumbered by conditional probabilities. 



336 Analysis of Variance: Complex 

Note that preliminary tests may have to do with the assumption 
of zero interaction in the numerator term of F (as for Case XV) or 
in the denominator term (as for Cases II and XI). Failure to 
satisfy the assumption of a zero interaction in the numerator will 
lead to too many ^'significant” Stated differently, significance 
for a mam effect ^annot be safely claimed because the numerator 
involves a possible confoimding of interactive and main effects. 
As pointed out earlier, failure to satisfy an assumption of zero 
interaction in the denominator term will lead to too few significant 
F's, which means that an obtained F possesses greater significance 
than its P indicates. 

Preliminary tests are also used in connection with the "pooling” 
of sums of squares and of their d/s. To understand the meaning 
of pooling, let us consider Case X in which all effects are testable 
against The advocated steps are First, s^rbc is tested against 
If this F is not significant at, say, the 05 level, the sum of 
squares for the 3-way interaction term is combined with that of 
with the dfs also being summed Dividing the pooled sum by 
the pooled df gives another estimate of variance for the error term. 
This estimate is next used to test the 2-way interactions, which if 
msignificant provide additional sums of squares and d/'s for addmg 
to the pool already made up. 

The claimed advantage of pooling is that the number of degrees 
of freedom for the denommator, or error, term of F is thereby in- 
creased, with a resultant more stable estimate of variance. But 
whether this procedure provides an improved or better estimate 
depends, of course, on whether the interactions judged to be in- 
significant are really zero in the sampled population. Actually, 
the F based on the pooled values may be either larger or smaller 
than the F based on the appropriate variance estimate obtained 
without pooling. When one examines the F table, one sees that the 
gain in df does not have an appreciable effect, in the sense that a 
smaller F is required for significance, except when W 2 is very small, 
say less than 8 or 10 It should be clearly noted that the gain in 
df by pooling does not lead to a reduction in the sampling errors of 
the means being tested 

The use of preliminary tests as a basis for pooling is not nearly so 
defensible as textbooks written prior to 1951 would have us 
beheve The work of Paul! J mdicates that the usually advocated 

t Pauli, A E , On a preliminary test for poolmg mean squares m the analysis 
of variance, Annofe maJth Stat , 1960, 21, 539-556 



Higher-Order Qassification 


337 


rule (that when F is le«s than the value required for the .05 level, 
pooling is pemna'ihle and advisable) is far from satisfactory. He 
sets up an elaborate set of niles leading to the decision “never 
pool” or “sometimes pool” or “always pool ” Space does not per- 
mit an exposition of his nile.s here. A simple rule to follow when 
the f//’s arc equal, or when uneiiual provided both are greater than 
6, IS to pool only when F is less than 2. Even when one follows the 
rules, F’s based on pooling do not lead to P’s of precisely the same 
meaning a« P’s obtained from P’s whidi do not involve pooling. 

HIGHER-ORDER CL4SSIFICATION 

There are times when it is both desirable and feasible to study 
the variations of a dependent variable associated with variations 
m more than 3 variables For such a study the data are classifi- 
able m more than 3 ways We have already mentioned the setup 
m which an obscn'aiion is made on each of m individuals under 
each of the combinations of conditions defined by rows, blocks, 
and columns. There will be RBC scores for each individual, and 
the scores may be classified not only as belonging to a given row 
and a specified column of a particular block but also as belongmg 
to a certain individual. Although it is easy to make an orderly 
arrangement of the data for quadruple clasafication, the required 
computations become somewhat burdensome For the situation 
involving a fourth classification, based on either individuals or 
on a fourth independent variable, there will be 16 sums of squares: 
1 for total, 4 for between groups, 6 for simple interactions, 4 for 
triple interactions, and 1 for quadruple interaction. When 5 
classifications are used w'e will have sums of squares for the total, 
5 betweens, 10 simple interactions, 10 triple interactions, 5 quad- 
ruple interactions, and 1 fifth-order interaction. It is not within 
the scope of this book to outline the computations for these higher- 
order cla8i»fications.§ 

The possibilities of the variance technique as a method of ex- 
tracting from 1 set of data information regarding not only pri- 
mary effects but also interactions have, at times, led to rather 
indiscriminate incluEdons of variables. For instance, a clasrifica- 
tion of subjects as male or female may be made m order to deter- 
mine possible sex differences. Since the typical experiment for 

§ See Edwards, A. L , and Horst, P , The calculatuHi of sums of squares for 
interaction in the analy^ of variance, Paychometrtka, 1950, 16, 17-24 



338 Analysis of Variance; Complex 

which the variance technique is used is likely to be based on a 
relatively small number of subjects, it is very doubtful whether 
any information of value will be added to the sum total of the 
already inconsistent findings concerning sex differences. 

Those who carry out studies mvolving more than triple classifi- 
cation encounter great difficulty in interpreting significant higher- 
order interactions. Some have thought it safe, after ascertaining 
the sums of squares for the pnmaries and the simple and triple 
interactions, to use the remainder vaiiance, which is a composite 
of untested higher-order interactions, as an error term Such a 
practice assumes insignificance for the interactions whose sums 
of squares are thus allowed to combine, but since there are in- 
stances of significant quadruple interaction, the cautious investi- 
gator will extract and test all the possible interactions before 
using such a remainder as the error term for F 

As a matter of fact, the choice of the proper error term for higher- 
order classifications is, at times, quite complicated. For the simple 
4-way setup mvolving the fixed constants model [ADGH], with 
m replications of individuals per cubicle, the estimate is the 
correct error term for testing all 4 main effects and all 11 inter- 
actions. For the mixed model [aDGH], with ar standing for in- 
dividuals (a typical setup), the main effects are tested against the 
respective 2-way interactions involving individuals (as in Case 
XIV, p. 330) ; the 3-way interactions are tested against the 4-way 
mteraction, but there are no exact tests of the three 2-way inter- 
actions involvmg mdividuals (which are usually significant when 
testable). For further detail on the 4-way classification, the reader 
IS referred to the Mentzer paper cited on p 304. 

FACTORIAL AND LATIN SQUARE DESIGNS 

The student who encounters the term ‘‘factorial design” will 
need to know that it is difficult to make a distinction between 
factorial design and the analysis of variance setups discussed in 
this chapter The bases for classification are referred to as factors; 
the categories within a classification are termed “levels ” Perhaps 
the term factorial design is inappropriate when 1 basis for classifica- 
tion is persons. 

The Latin square design had its origins in agricultural experi- 
mentation If T different treatments (fertilizers) are to be evalu- 



339 


Factorial and Latin Square Designs 

ated, a plot of land is laid ofT into T rows and T columns and the 
treatments are so assigned that each treatment occurs only once m 
each row and only once in each column. With I^atm letters 
standing for the treatments, one might have the accompanying 
square, an examination of which meals that this is a scheme for 

Columns 


12 3 4 


Rows 


I A D B C 

11 B .4 CD 

III C B D A 

IV D C A B 


balancing out the eflects ul possilde fertility differentials from row 
to row and also from column to column. 

Some researchers in psychology liave used the Latin square 
principle a.s a way of balancing tlie effect of individual differences 
and order of testing (practice) That is. with T conditions to be 
evaluated, the rows .stand for T individuals and the columns for T 
orders of testing, with Latin letters representing the T conditions 
The design also can lie and has lieen useil in lieu of a complete 3-way 
factorial design when all 3 factors involve the same number of 
levels For example. 16 pioperly arranged obsen'ations may be 
used instead of the 64 observations required for a complete 3-way 
classification plan with 4 levels per classification This second use 
of the Latin square principle is not for the purpose of balancmg out 
the effect of a factor but lathei for evaluating the effect of factors 
which are delilierately vaned. 

'rims, It would seem that the I^atin square design might be very 
useful in psychoh^', but liefure we accept it uncntically (as some 
advocates have), we need to examine the underlying mathematical 
mcKiel, wliich may Ih‘ written as 

{Kut — X) “ «r + + fret 

Tlie a, B, and y refer to row, column, and treatment effects, 
and Srei IS a remainder, or rciddual. It follows from the model 
that the breakdown of the total sum of squares and degrees of 
freedom will lead to sums of squares for rows, for columns, and 
for treatments, each with T — 1 degrees of freedom. These sums 
of squares will use up ST — 3 of the total df, — 1; hence there 



340 


Analysis of Variance: Complex 

remain — ST + 2 degrees of fieedom for the residual sum of 
squares. The vanance estimate based on the residual is used as 
the error term (denominator) of F when testing and s^f 

i^When the foregomg model is compared with that of complete 
3-way classification (p 327), we see a marked difference the 
absence of interaction terms For the Latin square design it is 
assumed that all interactions are zero This assumption is necessary 
because there are not enough degrees of freedom available for 
taking out possible interactive effects in order to arrive at an 
error vanance estimate appropriate for F. The more important 
thmg here is not that the Latm square does not permit a test of 
the interactions but that this design does not provide a proper 
error term unless the mtei actions are m fact zero 
Why doesn’t the residual provide a suitable error term when 1 
or more of the interactions is not zero'?’ In considering this ques- 
tion we must distinguish between the fixed constants model 
[ADG], which is applicable when the Latm square is used in place 
of a complete 3-way factorial design, and the mixed model [aDG\, 
which is called for when, say, rows stand for individuals 
We have already seen that for the mixed model the 2 main 
effects are tested by Fc = s^e/s^rc and = ^b/s^rb when we 
have a complete 3-way classification (Case XIV, p 330) If for 
Case XIV we pooled the sums of squares for the three 2-way 
interactions and the 3-way interaction we would have a residual 
term exactly analogous to the residual of the Latm square design. 
Note that the variance estimate obtamed by poolmg sums of 
squares and d/’s (Case XIV) is equivalent to calculating a weighted 
average of the 4 mteraction variance estimates. That is, the 
pooled sum of squares is equal to 

(R - 1)(B - 1)^6 + (22 - 1)(C - l)s2„ 

+ (J5 - 1)(C - + {R- 1)(B - 1){C - 

which, when divided by the sum of the degrees of freedom (sum 
of the weights), gives the residual variance estimate. 

Will such an estimate, as the error term, be larger or smaller 
than the estimates, s^r6 and which are the proper error terms 
for testing and K there are 2-way interactive effects 
present, the value of s®r&e 'will tend in general to be smaller than 
either a® rt, or s®,*- Accordm^y, when we pool, or take a wei^ted 



341 


Factorial and Latin Square Designs 

average, we will have an average which has been pulled down by 
the relatively small value of (It would also be affected by 

the presence or absence oi B X C interaction.) This means that 
the vanance estimate based on the pooled residual will tend to be 
smaller than either s^rb or s^rc, hence the use of the residual vari- 
ance as the error term for Case XIV mil produce too many “sig- 
nificant” i^'s. Likewise, and for precisely the same reasons, the 
use of the residual vanance of the Latin square design as the error 
term for testing mam effects will produce too many “significant” 
F's if the 2-way interactions involving individuals are nonzero. 
This is true because, in effect, the residual vanance of the Latin 
square is a weighted average analogous to that obtained by pool- 
ing for Case XIV. When it is recalled that tested interactions 
involving individuals aie nearly always significant, we see that 
F^s derived from Latin squares (mixed model) are not only not 
dependable but also apt to be fallacious 

Let us next consider Latin squares used in lieu of 3-way fac- 
torial designs when all the effects involve the fixed constants 
model [ADG] It will be recalled that for complete 3-way classifi- 
cation the proper eiror variance for testing all effects is when 
there aie m cases per cubicle (Case X, p 328), and s^rhc when we 
do not have replication, provided it can be assumed that the 
3-way interaction is zero (Case XI). If for the 3-way factorial 
setup we extracted the sum of squares for each of the 3 main 
effects, and then took the remainder as a residual we would, of 
course, have the exact equivalent of a pooled sum of squares in- 
volving the three 2-way interactions, the 3-way interaction, and 
the within-cubicles variation (if we had replication). As in the 
mixed model, the variance estimate based on such a residual will 
be a weighted average of s^rc, ^hc, and s^w But this 
time the weighted average will tend to be higher than the proper 
error variance, s^w if any 1 of the 4 interactions is not zero in the 
population Thus, for the fixed constants model the use of the 
residual variance estimate as the denominator of F will tend to 
give too few “significant” F’s This tendency to underestimate 
significance will become greater as more of the 4 interactions fail 
to be zero, and. furthermore the larger the interaction (s) the 
greater will be the underestimation. 

Since the residual variance in the Latin square is precisely 
analogous to the weighted average just discussed, we have the 



342 Analysis of Variance: Complex 

imescapable conclusion that too few significant F's will emerge 
when the Latin square design is used in place of a 3-way factorial 
design (fixed constants model) if the assumption of zero inter- 
action does not hold for any of the 4 possible interactions Since 
the Latin square design does not provide data for testing the 
assumption of zero interactions, its use cannot be defended unless 
there are strong a priori reasons for believing that all 4 interac- 
tions ai’e really zero The only consolation left for the user of the 
Latin square (fixed constants model) is that an obtained sigmfi- 
cant F may possess greater significance than indicated by the 
F table, but even this solace is ephemeral if it occurs to him that 
the assumptions might just happened to have been met. Actually, 
though one can trust a sigmficant F, one cannot safely claim 
added significance unless there is reason for suspecting real inter- 
action. The most telling objection to the fixed constants Latin 
square is that its use stacks the cards against the obtaining of 
significant F's — ^the null hypothesis will all too often be falsely 
accepted. 



CHAPTER 17 


Analysis of Variance: Covariance Method 


It is usually possible in expenmentation to choose, either by 
random methods or by painng or matching, groups that are com- 
parable on variables judged relevant to the comparisons to be 
made. There are times, however, when it is more practicable to 
use intact groups which may diflCer m important respects, and 
occasionally one may wish to make an unanticipated comparison 
which does not seem justifiable in light of known differences be- 
tween groups. Experimental control is the ideal, but, if this 
cannot be attained, one may resort to statistical allowances and 
thereby arrive at valid conclusions. 

Suppose that 2 intact groups are being used to evaluate the 
relative merits of 2 methods of memorizing and that the mean 
IQ is 105 for group A and 111 for group B. Now, if there is an 
appreciable correlation between the particular memorizing ability 
involved and intelligence, the results will need qualifying because 
of the difference in intelligence of the 2 groups. It would seem 
logical to use the regression equation, for estimating memory 
score from intelligence, as a basis for predicting how much of a 
difference in memorizing would arise because of the group dif- 
ference in IQ's. Let us suppose that the mean memory perform- 
ance is 60 for group A and 70 for group B, and that substituting 
105 and 111 in the regression equation 3 rields a predicted value of 
62 for group A and of 68 for group B Thus our prediction would 
lead us to expect a difference of 6 points, and accordingly it would 
be said that 6 of the obtained difference of 10 could be attributed 
to lack of comparability of the 2 groups with respect to intelligence. 

The next question concerns the proper sampling error to use 
in evaluating the adjusted difference. It should be obvious that 
the ordinary procedure is inapplicable for the simple reason that 

343 



344 Analysis of Variance: Covariance Method 

we have tampered with the obtained means and in so doing have 
interfered somewhat with the operation of chance. 

It is the purpose of this chapter to give a precise method for 
leaking allowance for an uncontrolled variable and to set forth 
the samphng error adjustment which is needed in testing the 
statistical sigmficance of the difference between ^‘corrected'’ 
means. The method is applicable whenever it seems desirable 
to correct a difference on a dependent variable for a known 
difference on another variable which for some reason could not 
be controlled by matching or by random sampling procedures 
Since the scheme about to be proposed has an analysis of variance 
setting, the reader can readily guess that it will provide an adjust- 
ment for, and a test of sigmficance of, the differences between 
two or more groups, and that it will be usable for either large or 
small samples. It is assumed that the dependent variable has 
a distribution which does not depart too far from the normal 
type and that the variances from group to group are similar. 

In order to present the required adjustments, we need 
first to consider covanance^ which is defined as 2ixy/N or 
S(Z — X)(7 — Y)/N. The sum of products of deviations can 
be broken down into components in a manner similar to that 
used with a sum of squares. In the simplest situation we can have 
m pairs of X and Y scores in each of fc groups. These pairs of 
scores can be recorded in some such fashion as that depicted in 
Table 58 Note that X^§ and Y^J stand for the X and F values 


Table 58 , Schema op Scoebs fob Covariance 


1 

Xii Yn 
X21 Y21 

Xa 

Xml Yml 


Group 

2 3 k 


Xjj 

Yn 


Y^ 

Xu 

Yu 

Xst 

Fs2 

Xi, 

Y2, 

Xu 

Yu 

Xa 

Ya 

X%j 

Y., 

Xu 

Yu 

X,a 

Y,a 

x„, 

Y„, 

Xmk 

YnU 


of the zth individual in the jth group. Note also that in allowing 
i to take on values running from 1 to m we do not imply any order 
for the individual, and that the £th individual in one group is in 
no sense paired with the ith case in another group. The product 
of the deviation scores for the ith individual in the jth group 
would be (Xij — X)(Yij — 3^, in which X and F are the 



Analysis of Variance: Covariance Method 345 

means for all k7n cases. The total sum of products would be 
» J — — 

SS(X„ — X)(Ytj — V). Now each deviation can be expressed 
in terms of two components in exactly the same way as in Chap- 
ter 15; i e , one part is the deviation of the score from the mean 
of the group to which it belongs, and the other pait is the devia- 
tion of the group mean from the total mean Thus we have 

(A’„ - X) = - X,) + (J, - J) 

and 

(r„ - F) = (F„ - F,) + (F; - F) 

Then the above sum of the products becomes 

ss[(x„ - X,) + (X, - m(y,j - Z) + (Z - r)] 

When the bracketed expressions are multiplied together, four 
terms result, and, since two of these vanish, we have left that 
the total sum of products is equal to 

:s2(A-„ - - F,) + - ^)(Z - Z 

The first of these terms involves a miAzn-groups sum of products, 
whereas the second is for between groups. If there happens to be 
an unequal number of cases per group, the m of the second term 
goes under the summation dgn as The degrees of freedom 
for the total sum of products is fcm — 1, or IV — 1, where N is 
the sum of the m/s; the dfs for the withm and between terms are 
km — k (or N — k) and k — 1 respectively. 

It will be of convenience to assemble in a table the sums of 
products, along with the sums of squares, for both the X and Y 
variables. These ivill be found in the first three hnes of Table 59. 

Although we are here presentmg the covariance technique as a 
method for making such adjustments as discussed in introducing 
this chapter, it is of interest to link covaiiance with the problem 
of correlation. The product moment corrdation coefficient is 
usually defined as 

_ Za-y 
Na^y 

which may be written as 

Zgy 2^ 2(X - X)(7 - F) 

JJ p g“V2^V^ V2(Z - J)®V2(y - F)-" 


r = 


346 


Axialysis of Variance: Covariance Method 



Adjusted Sas® (B* - A**/Cd nunus (B,» - A®«,/C«,) equals adjusted Bj 



Analysis of Variance: Covariance Alethod 347 

or as a function of a sum of products and two sums of squares 
Using the sums of Table 59, we may specify three correlations 
one based on the total sums, one based on the within sums, and 
one based on the between sums. These three correlations aie? 
indicated m Ime 5 by letters A, B, and C, with appropriate sub- 
sciipts used to designate the several sums m the first three Imes 
of the table. Lme 6a gives the d/’s for the r’s 

Note that the between-groups r is actually the correlation 
between the X means and the Y means for the groups If this r 
is sigmficant, it follows that one source of the correlation for the 
total group is the heterogeneity resulting from the throwmg 
together of groups with unhke means. (This between-groups 
correlation is meaningless when only two groups are involved 
Why?) Stated differently, an appreciable between-groups r mdi- 
cates that the total r is spurious; this Epuiiousness is eliminated 
when r is computed from the within sums The similarity of the 
within-groups r to the partial correlation coeflident will be recog- 
nized by the discemir^ student, espedally if he recalls the denvar 
tion of the latter. 

We now turn to the use of covariance as a basis for allowing 
for the influence of an uncontrolled variable on the differences 
between group means. The question here is not what the result 
would be if the uncontrolled variable were held constant, as m 
partial correlation, but rather what the result would be if the 
groups were made comparable with respect to the imcontroUed 
variable. Let X represent the dependent variable, and F the 
uncontrolled variable. It is presumed that the F, values differ, 
and that X is correlated with F in a linear fashion. For purposes 
of eiqposition we shall refer to Table 59, which will serve as an 
outlme of the required computations, line 6 of this table gives 
the regresdon coefficients (Jbsy) for predicting X from F. Since 
no use wiU be made of Aj/Ct, it is bracketed; it need not be com- 
puted. 

That these A/C values are regression coefficients can readily 
be demonstrated. In Chapter 7 the regresdon of X on F was 
given as 

hxv - r — 



34S 


Analysis of Variance: Covariance Method 
Since, as we have seen above, 

r = <r, = VS?7iv, and = Vs^ 

we have 

'Lxy V'Lx‘/N 

~ Vs?V^ ' V^/N 


"Zxy _ A 

C 


In order to make allowance for the uncontrolled differences m 
T„ we need not only to adjust the values but also to make an 
adjustment to the error term, which is used as the denominator 
of the F ratio m testing the (hfference between the adj’usted X 
means. As in the ^pler situation of Chapter 15, F will mvolve 
the ratio of a between-groups to a withm-groups variance 
estimate. 

First, let us consider the method of making the adjustment to 
the total and to the within-groups variance estimates The 
problem here is that of ^ecifying how much of the variation in 
X can be predicted from variation in Y and then of subtracting 
this to secure the left-over variation as an adjusted value. But 
this left-over variance is nothing more than the residual variance, 
or square of the standard error of estimate, obtainable from for- 
mula (35): 

<^xy=0^x — 1^<^x 


Actually the adjustment is to be made to the sum of squares. In 
order to state the residual variance in terms of sums, we may 
substitute for <r*j, and r®. Thus, 


hence, 


~N ~ (Sa^)(V) ^ 


Na^x » = - 


V 


Since Na^ always equals a sum of squares, the value of „ is 
obvioudy the sum of squares for the residuals. In the notation 



Analysis of Variance: Covariance Method 


349 


of this chapter, 


= SS(X„ - If - 


[SS(X., - T)(X,, - F)]^ 

i:s(7„ - F)2 


would be the residual sum of squares after the regresdon adjust- 
ment This sum can be written as 




= Bt- 


Ct 


which is the entry for the total group in line 7 of Table 59. Simi- 
larly, the corresponding residual, or adjusted, sum of squares for 
wrihin groups is — A^„/Cu,. 

At first thought it would seem logical to adjust Bs by the use 
of As and Cj, but the between-groups correlation (and regression) 
is affected by the differences between the X means, which are 
the differences to be adjusted and then tested for statistical signifi- 
cance. Our adjustment should be one which is mdependent of 
the differences to be tested. This suggests that the regression 
for within groups, or A,o/C„, should be used since the regression 
for the total is also affected by the difference which we are out to 
test In so far as we are concerned solely with the adjustment 
of the between-groups X means, the best adjustment would be 
by means of the within-groups regression. This could take the 
form of either an adjustment to the between-groups sum of squares 
for X or a direct adjustment to the several X, values. 

Although the latter would be the best way of ascertaining how 
much of an effect the noncomparability of the groups with respect 
to Y had upon the X means, there is another conaderation as to 
whether the within regresaon is appropriate for adjusting the 
between-groups sum of squares It will be recalled that f is to 
be taken as the ratio of a vaxiance estimate based on the between 
sum of squares to that based on ivitlun groups, and that the two 
variance estimates being so compared must be independent esti- 
mates. Now, if we adjust both the within and the between sum 
of squares by means of the same regrestion coeffident (say, that 
based on within groups), any sampling error in this r^ression 
coeffident would have a similar effect on both adjustments; hence 
it could not be argued that the resultmg adjusted sums of squares 



350 


Analysis of Variance: Covariance Method 


possess the requisite independence. Therefore variance estimates 
based thereon would not be strictly independent 
This difficulty is overcome by taking the adjusted sum of 
squares for between groups as the difference between the adjusted 
total sum and the adjusted within sum of squares. Thus, for the 
purpose of testmg sigmficance, 



leads to the proper adjustment for the between sum of squares 
forX. 

Perhaps the reader has anticipated that the dfs may change as 
a result of these manipulations. The new dfs are recorded m 
line 8 of Table 59. Note that the df for the between sum has not 
changed since the adjustment was not made by using the between- 
groups regression. 

Aside from the usual methods for calculating sums of squares, 
we need formulas for computing sums of products in terms of 
raw scores. The foUo^ving formulas are written for tmequal mj 
values, but are of course applicable for equal m's. 


2S(X„ - - 7 ) 

% 3 t 3 

SSX„SSF», 

= SSX„F„ for total flOSo) 

N 

si(X„ - X,)(F„ - 7 ,) 

% 3 3 SXxjSyM* 

= SSX„F„-S — - — - for within (1086) 

mj 

- 7 ) 


m, 


% 3 % 3 

N 


for between (108c) 


Thus to compute the sums of products of deviations, we need 

% 3 

the sum of all N raw score products or SSXijF*,, the sum of all 

w % 3 

the X^s or SSX^j, the sum of all the F’s or SSF^j, the sum of the 

t 

X’s separatdy for each group or 2X„, and the sum of the F’s 



Analysis of Variance: Covariance Method 


351 


for each separate group or 27,^. Addmg the several X sums 
^ves the sum of all the X’a; likewise for F’s. Note that to get 
the second term of (108&), or the first term of (108c), we must 
divide the product of the two sums for a group by its m and then 
sum such quotients over all fc groups The reader may find some 
interest in comparing formulas (108) with formulas (98), and it 
should be apparent that in the case of equal m’s formulas (108) 
can be wntten m the simpler way of formulas (97) 


ToMe 60. Scobs Data and Sums Based on Baw Scores for Analysis 
OF Variance by Covariance Adjustments 





Group 





1 

2 

3 



Y 

X 

y 

X 

y 

X 



14 

10 

11 

5 

7 

5 

2SX = 173 


9 

6 

9 

2 

6 

4 

ssr = 268 


11 

8 

8 

6 

2 

1 



12 

6 

10 

5 

10 

7 

SSZ* * 1161 


10 

9 

10 

4 

7 

9 

227® = 2642 


11 

7 

10 

8 

7 

4 



11 

9 

12 

10 

6 

5 

22X7 = 1688 


8 

5 

9 

6 

3 

2 



11 

6 

10 

4 

2 

2 

2(2X)® = 10,401 


12 

7 

11 

6 

9 

5 

2(27)® = 25,362 

Sum 

109 

73 

100 

56 

59 

44 

r = 677 

Mean 

10 9 

73 

10.0 

56 

59 44 

7 - 8.93 

ST® or 








SZ* 

1213 

557 

1012 

358 

417 

246 


SZ7 

810 

571 


307 



The required computations are illustrated by uang the data 
(fictitious) of Table 60, which contains Y and X scores for 10 cases 
in each of 3 groups. The scores m each of the 6 columns are 
separatdy summed to yield 109, 73, etc. The scores are squared 
and summed to yield 1213, 557, etc. Summmg the products of 
the X and Y values gives 810, 571, and 307 for the 3 groups. 





352 Analysis of Variance: Covariance Method 

Summing over groups yields the double summations 173, 268, 
etc Certain of these sums are then substituted mto formulas 
(108) to secure the total, \vithm, and between sums of products 
of deviations. By substituting the proper sums into formulas 
(97), we get the required sums of squares for the X’s and for the 
V’s. Then these 3 sets of sums are entered as the first 3 rows of 
Table 61, which follows the pattern set forth in Table 59. 


Table 61 Analysis, op V.veivnce pob X VABi 4 .BiiE op Table 60 by 

COVABIANCE AdJUSTMESTS POE UNCONTBOLLED Y 



Total 

Withm 

Between 

1 Sum of products 

142.53 


69 83 

2 Sum of squares X 

163 37 


42 47 

3 Sum of squares Y 

247 87 


142 07 

4 df 

29 

27 

2 

5 Coi relation 

709 

643 

.912 

5a df for r 

28 

26 

1 

6. bxy value 

.5750 

6871 


7. Adjusted Sa;® 

81.42 minus 70 95 equals 10 47 

8 df 

28 

26 

2 


Before proceeding to the covariance adjustment, let us con- 
sider the means given in Table 60 It Tnll be noticed that the 
groups differ considerably on X, or the dependent variable, 
and that they also differ on Y, the relevant but not controlled 
variable. An analysis of vanance based on the sum of squares 
for the X’s leads to a between-groups variance estimate of 
42 47/2, or 21.26, and a within-groups estimate of 120 90/27, or 
4 48. The F for testmg the sigmficance of the between-groups 
variance becomes 21 26/4 48, or 4 75, which for the ^ven d/’s 
IS agnificant at about the 02 or 03 level of significance. This 
analy^ does not, of course, allow for the fact that the groups 
differ on Y. If there is correlation between X and 7, the observed 
differences on X may be mainly a reflection of the group differences 
on 7 . As previoudy stated, the purpose of the covariance adjust- 
ment is to make statistical allowance for such uncontrolled 
differences. 











353 


Analysis of Variance: Covaiiance Method 

By following the steps indicated in Table 59, w’e determine the 
values in lines 6 to 7 of Table 61. Note that the adjusted 
for between groups, 10 47, is secured by subtracting 70 95 from 
81 42. The analyas of variance based on the adjusted sums of 
squares (for the Z’s) gives a between-groups variance estimale 
of 10 47/2, or 5 23, and a within-groups estimate of 70 95/26, or 
2.73. Then F = 5 23/2 73 = 1 92, which for 2 and 26 degrees 
of freedom yields a P of about .20. Accordingly, it cannot be 
concluded that there are significant group differences on X over 
and above those which would be expected because of the differ- 
ences on Y. 

It should be obvious that the use of the covanance adjustment 
method must be justified by logical and experimental considera- 
tions. When it is logical to control a vaiiable by pairing or match- 
ing, then the covariance adjustment is defensible as a W'ay of 
making proper allowance for a failure, because of infeasibility, to 
control the variable. The use of the covariance adjustment is 
not predicated on the degree of correlation between the dependent 
and the uncontrolled vanable If the conelation is relatively 
low, the adjusted values will differ but httle from the unadjusted 
values, if high, both the total and within adjusted variances will 
differ considerably from the unadjusted variances, but, as we ^all 
presently see, the extent to which the adjusted and unadjusted 
between-groups variances differ is not solely a function of the 
correlation. 

It is of interest to make an actual adjustment of the X means 
of Table 60 for the group differences on 7. The adjustments can 
be made by 

X,a = X,- - 7) 

in which Xja is the adjusted value for the jth group, and hxy is 
the wi/wVgroups regression coefficient. For the data of Table 60 
we have 

Xia = 7 30 - .687(10 90 - 8.93) = 5 95 

X2a = 5.60 - .687(10 00 - 8 93) = 4 86 

Xsa = 4.40 - 687(5.90 - 8.93) = 6.48 

Should the reader be surprised that the adjustment puts group 3 
ahead, he ^ould ponder the fact that, rdative to the wUhin- 
groups X and 7 variances, the third group’s X of 4.40 was not as 



354 Analysis of Variance: Covariance Method 

far below the means of the other 2 groups as was its F of 6 90 

From a careful consideration of the foregoing, it will be seen 
that the covariance adjustment method will not necessarily reduce 
the differences between the means on the dependent variable 
Situations arise in which groups that show marked differences on 
some correlated but uncontrolled variable may yield similar 
means on the variable being studied Suppose that we are using 
2 intact groups to investigate the relative merits of 2 leaining 
methods, and that the initial means of the 2 groups are markedly 
different We would, accordingly, expect a difference on final 
standing even though the 2 methods were equally efficacious If 
this expected difference is not found, it follows that the method 
used by the group with the lower initial score was more effective 
in that this group overtook the other group With groups differ- 
ing on an uncontrolled variable, it is not only as proper, but also 
as necessary, to use the covariance technique when the groups 
are nearly the same on the dependent variable as when they are 
different For such situations the adjustment will increase the 
between-groups variance The adjusted variances are sometimes 
referred to as ^‘reduced'' variances, but it follows from the above 
that this term may be a misnomer for the adjusted between-gvoups 
variance 

The extent to which the adjusted variances lead to a level of 
significance different from that based on an analysis of the un- 
adjusted values will obviously depend upon 3 things- the degree 
of correlation between the dependent and uncontrolled variable, 
the size of the differences between the groups on the uncontrolled 
variable, and the found differences on the dependent variable 
The applicability of the covariance technique does not depend 
upon a minimum degree of correlation or upon a definite amount 
of group differences on the uncontrolled variable But, if the 
within-groups correlation is low and/or there is only a small, 
chance difference between the groups on the uncontrolled variable, 
the use of the covariance adjustment may not be worth the effort 
Obviously, if a variable correlates near zero with the dependent 
variable, it need not be controlled experimentally or statistically 

The covariance method can be extended to make adjustments 
for group differences on more than 1 uncontrolled variable This 
involves the use of multiple regression, but computationally it is 
perhaps simpler to handle the adjustments in terms of multiple 



355 


Evaluation of Changes 

r^s We need 2 multiple correlation coefficients, one obtained by 
way of correlations based on within-groups sums of squares and 
of products, and the other by way of correlations based on total 
sums of squares and of products ^ 

If, for example, allowance is to be made for 3 uncontrolled vari- 
ables, Fi, Yoj and F3, we will need 6 (one for each pair of 
variables — X is the fourth or dependent vanable) auxiliary tables 
consisting of entries like those in lines 1, 2, and 3 under the “total'' 
and the “within" columns of Table 59 (or Table 61) We can 
then calculate 2 sets of intercorrelations (each auxiliary table 
^vlll lead to 2 r's when the substitutions called for in line 5 of 
Table 59 are made) among the 4 vanables, and from these we 
compute, by the methods set forth in Chapter 11, two 
values Let us designate the multiple based on the total sums as 
Rt and that based on the within sums as Ry. 

With these 2 multiple r’s available, we may rewrite line 7 of 
Table 59 as 

Bt{l — R^t) minus jBio(1 — R^w) equals adjusted JS& 

with respective dfs of 

2V — w, iV — fc — (n — 1), t — 1 

for the n variable problem (1 dependent, plus the number of 
uncontrolled variables included in the adjustments). 

Remark The use of the covariance adjustment technique is 
far superior to attempts at pairing individuals from the intact 
groups on the basis of 1 or more uncontrolled variables, a pro- 
cedure which inevitably leads to a reduction of sample size and 
also runs astride a regression difficulty * 

Evaluation of changes. In Chapter 6 (pp. 90-92) we dis- 
cussed the usually advocated method for comparing changes 
shown by experimental and control groups (applicable also for 
2 experimental groups). We have, with ^ and / standing for the 
pretest and posttest measures and E and C standing for experi- 
mental and control groups, 

D = De — Sc = (XfE — ^ie) — {Xfc — Xc) 

as the net change, the change shown by the experimentals cor- 

* See Thorndike, R L , Regresdon fallacies in the matched groups experi- 
ment, Psychometnica, 1942, 7, 85-102 



356 Analysis of Variance: Covariance Method 

rected for that shown by the controls. We may rearrai^e the 
yet maintain the numerical value of D = jDb “ as 
follows: _ _ 

jD = (X/js — ^/c) "" (^tJS ^ic) 

from which it is seen that the net change may also be thought of 
as the final difference betw^een the 2 groups corrected for their 
initial difference. Such a correction involves the assumption that 
each unit of difference in mitial standing will pioduce a unit of 
difference in final standing In other woids, this type of adjust- 
ment implies a 1-to-l relationship between initial and final scores. 
Since a perfect correlation is never found or approached in prac- 
tice, one may question whether the usual procedure of comparing 
changes is really defensible 

It IS, of course, entirely logical that group differences on final 
scores, which we may here call the dependent variable, should be 
corrected for group differences on initial standing as an uncon- 
trolled variable. The covanance adjustment technique provides 
a way of correcting final means for mitial differences, with due 
allowance for the degree of coirelation between initial and final 
scores The ordinary and the covanance method differ not only 
in the correction but also in the resultant sampling error The 
ordinary technique uses a standard error which definitely includes, 
either explicitly or implicitly, the vanance for both initial and 
final scores and the correlation of initial with final, whereas the 
error term used in the covanance method is a direct function of 
the degree of correlation and of the vanance for the final scores 
only. In other words, the net differences being tested are not the 
same, and neither are the error terms the same The covariance 
method will, in general, be more sensitive The student should 
read Professor R A. Fisher's discussion on this point j 

t Chapter IX in Fisher, R A , Design of experiments, London Oliver and 
Boyd 



CHAPTER 18 


Distribution-Free Methods 


The tests of significance involving F, t, or CR (entical ratio) 
are based on an assumption of normality For large samples tJie 
degree of ske\vness that can be tolerated is a function of sample 
size (see p 100 for efiect of skewness on samplmg distribution of 
the mean), but for small samples skewness becomes a disturber. 
Occasionally psychologists have score data which are so markedly 
skewed in distnbution (e.g, certam scoiing categories of the 
Rorschach test) that it is not possible to noimahze the distiibu- 
tion either by McCall’s T scalmg techmque (p. 39) or by math- 
ematical transformations * Accordmgly, we may need what have 
been called nonparametnc or, more appropriately named, distri- 
bution-free methods 

Actually, the technique can be classified as distnbution-free — 
no assumptions are made about the distribution of the variable or 
variables underlying the categories Likewise, tests of the signifi- 
cance of relationships by way of Spearman’s rho or Kendall’s tau 
(pp. 208-210) do not depend on assumptions regarding trait 
distnbution. 

In general, distribution-free methods, when apphed for com- 
paiative purposes to data which are normal, are not as sensitive 
(that is, as powerful for avoiding type II errors) as the appropriate 
CR, t, or F technique. Consequently, it is unwise to use a non- 
parametric method as a short-cut for testing significance when 
the assumption of normality is tenable 

The sign test. Perhaps the simplest of all distribution-free 
methods is the “sign” test which is applicable for testing the dif- 

* See Mueller, C G , Numencal transformations m the analysis of experi- 
mental data, Psychcl BuU , 1949, 46, 198—223 

357 



358 


Distribution-Free Methods 


ference between 2 correlated sets of scores The procedure is to 
consider the N pans of differences, Xi — Z 2 , some of which will 
be plus, some minus (with an occasional zero). If there is no 
difference between the 2 sets of scores we would expect the plus 
and minus signs to be equally divided To test whether there are 
more plus signs than reasonable on a chance basis, the binomial, 
(p + with p = 50, is used (N is for the pair differences having 
a sign, it IS the sample size less the number of zero differences) 
in the manner discussed earher (pp 49-51). For effective N 
larger than 10 we may use either the normal curve approximation 
to the binomial (pp 46-49) or the approximation (pp 212- 
214) Whether one uses the binomial itself or one of the approx- 
imations, care must be taken to secure a P that represents 
whichever — a one-tailed or a t^vo-tailed — ^test is appropriate for 
the hypothesis bemg tested 

The ‘‘median” test. A procedure for testing the difference 
between 2 sets of independent scores is to use the median for the 
2 groups combined as a basis for dichotomizing. This leads to a 
fourfold table above vs. below (the median) on 1 axis, group vs. 
group on the other. Then the test for the fomfold table 
(pp. 224-225) may be employed, with Yates’s correction if neces- 
sary With very small iV’s the exact probability method (pp 240- 
242) would be used. The idea back of the median test is simply 
that 2 samples drawn from 2 populations having the same median 
should yield equal splits In practice, difficulties are sometimes 
encountered in attempting to dichotomize exactly at the median. 
When the median is an integer and several scores are equal to 
the median, the dichotomy can be taken as those scores which 
exceed the median vs those which do not exceed the median. 

Median test for more than 2 independent groups. This is 
a straightforward extension of the median test to provide an 
over-all test of the differences between, say, C independently 
drawn groups. On the basis of the median of the distribution of 
the C groups combined, the scores are dichotomized (as near the 
median as possible) This will lead to a 2 by C table from which 
one may obtam a x^ with C — 1 degrees of freedom. 

Whether we are dealing with 2 groups or with C groups, the 
iV’s for the groups need not be equal for use of the median test. 

Test of C correlated sets. Suppose R mdividuals (or R sets of 
matched persons) wdth scores under C different conditions This 
IS, of course, the familiar 2-way classification setup which we dis- 



Mann-^Hiitney 17 Test 


359 


cussed (Chapter 16) under the analysis of variance, mixed model. 
We shall be concerned here mth testing the effect of the C con- 
ditions 

For the distnbution-fice test ive need to arrange the scores In 
R rows and C columns The median acoie for each row is detei'- 
mincd, and then the scores in each row which eiceed the row me- 
dian are assigned a plus This will lead to C/2 pluses in each 
low it C IS an even numbei, and to ((' — l)/2 it C is an odd num- 
ber Any ion having all scores identical is ignored, and therefore 
R in the tolloning exposition is the oiiginal R minus the number 
of rows with identical .scores in the low The pluses tor each 
column are counted Let Tc stand toi tlie number ot pluses in 
the cth column For (7 even we will have a total ot RC/2 plus 
signs If there are no real column effects we would expect these 
RC/2 pluses to be distributed excnly over the columns Thus 
we w’ould evjiect R/2 pluses per column, on the basis of the null 
hypothesis For (' odd we will have a total of R(C — l)/2 plus 
sigiLs, winch when divided evenly among the columns would give 

— — — - as the chance expected number of pluses per colunm 

With an obseived number of pluses per column, and an ex- 
pected number per column, we have what begins to look like a 
situation, and so it is, but not an oidinary one The manner of 
assigning pluses leads to subtle restiictions, such that we have 


, C{C - 1) e 

= - - ^T.-RA/C)^ 


RA(C - A) 


(109) 


with <7—1 degrees of freedom. The value of A is taken as C/2 
when C is even and as (C — l)/2 when C is odd Note that RA/C 
is the expected value on the assumption of no real column effects. 

Mootl,t who presents the foregoing test, states that for the 
test to be valid either R should be as laige as 10 or RC .should be 
20 or more. Wlien C — 2, this method yields exactly the same 
X® as that obtained by the x® technique applied to the sign test 

Mann- Whitney U test. This test, which is applicable only to 
results based on 2 independent groups, involves rank ordering the 
scores, for the 2 groups combined, from greatest (rank 1) to least 
(for which the rank will he N = Ni + unless there are ties 

t See Chapter 16 of Mood, A M , IrUroductton tu the theory of etatistus, 
New York McGiaw-Hill, 1950. 



360 


Distribution-Free Alethods 


for the bottom position) When ties occur, each person involved 
is assigned the average of the ranks that would be assigned in 
case the tied persons could be differentiated (see p 209). Then 
tlie ranks so assigned are summed separately for each group Let 
Ti and T 2 represent these 2 sums. (As a check on the arithmetic, 
N{N + 1) 

Ti + T 2 should equal r , the sum of the first N natural 

2i 

numbers ) 

When both Ni and are 8 or greater, the statistic 


Ui = ATiATs + 


+ 1 ) 


- r. 


( 110 ) 


is distributed normally about a chance expected value, or mean, 
given by iV'iiV 2 / 2 , and with vanance of NiN 2 {Ni + iV 2 + 1)/12. 
We then have 


X 

a 


Ui - N 1 N 2/2 




NiNziNi + N2 + I) 

12 


as a unit normal deviate by which the significance of Z7 as a devia- 
tion from the null hypothesis expected value is determined. If, 
as an alternate, we define U by replacing Ti with T 2 and Ni with 
JV 2 (in the second term), we will have U 2 . Now Ui and U 2 will 
deviate to the same extent, but in opposite directions, from 
N1N2/2 

When Ui is lai^r than N 1 N 2 / 2 , the direction of the difference 
between the 2 sets of scores is such that group 1 is superior to 
group 2 (If ranks are assigned with the least score as rank 1, 
and so on, the value of 17i will be smaller than N 1 N 2/2 when 
group 1 is superior.) For Ni and N 2 less than 8, special tables 
are required for judging the significance of U These may be 
found in an article by Mann and Whitney, Anncds math. Stat , 
1947, 18, 50-60 Note, these tables are set up on the basis of the 
least score being assigned a rank of 1 

The U test is more poiveiful than the median test, and hence is 
preferable to the latter imless there are too many ties in ranks. 

For other distribution-free methods the reader is referred to an 
article by Lincoln Moses, Psychol BvU., 1952, 49, 122-143 It 
might be remarked that at present little is known about the relar 
tive merits of the many available techniques. 



CHAPTER 19 


Remarks on Error Reduction 


In this brief chapter •we shall attempt a summary and mtegra- 
tion of implications, scattered through this book, ha'ving to do 
with the reduction of error vaiiance in psychological research. 
In a sense, this is an extension of an earlier discussion (pp 88- 
90). Some of the additional concepts and techniques could not 
have been introduced at that time since an understanding of 
them is dependent on mateiial presented in the intervemng 
chapters 

For our present purpose, we shall subsume errors under 3 head- 
ings measurement or observational errors, errors in inferring 
population parameters in field or survey studies, and errors m 
experimental testing of hypotheses. About the first of these, we 
remark only that errors of measurement can be reduced by de- 
veloping more rehable tests or (when fearible) by avera^g 
repeated measurements. 

FIELD STUDIES (SURVEYS) 

Surveys for the purpose of gauging opinion, and studies de- 
signed to establish normative data, require laige scale sampling. 
The aim is to secure a sample which is unbiased, that is, repre- 
sentative of a defined population, with chance sampling errors as 
as possible. We shall limit ourselves to 3 samplmg methods: 
random, stratified, and area. 

Random sampling. The conditions of random samplmg have 
been specified earlier (p. 55) By the method of random sam- 
plii^ it is fairly easy to arrive at a representative sample, pro- 
vided the universe has been catalogued. Thus, if one •wishes a 

361 



362 


Remarks on Error Reduction 


representative sample of school children of a certain grade in a 
city, he can secure it by a purely mechanical scheme, such as 
taking eveiy wth caid from the files Although this type of 
s^tematic sampling does not exactly satisfy the conditions of 
random sampling, it will assure a random sample unless the cards 
have been systematically arranged (in a somewhat peculiar older). 
The use of the random method for sampling an uncatalogued 
population involves so many difficulties in psychological research 
that no schemes are to be found in the literature 

Increasing sample size is the only way by which one can reduce 
chance eriors when the random method is bemg employed That 
sheer sample size is not enough to reduce nonrandom errors is 
evidenced by the Literary Digest straw polls, which lested on the 
assumption that the population of telephone subscribers and car 
ovmeis was not different in its voting preference from the entire 
population of potential voters This happened to hold before 
1936, so that replies to ballots mailed at random to telephone 
subsciibers and car owners forecasted fairly accurately the elec- 
tion results Despite a very large sample, the Digest poll failed 
miserably in 1936; this failure is attributed to the alignment of 
voting to income levels, an alignment that did not exist in prior 
years. 

Stratified sampling. In the stratified method, one or more 
individuals are pulled at random from each of several strata, the 
number in the sample from each stratum bemg proportional to 
the universe number in the stratum, and the strata are predeter- 
mined by knowledge of some control variable or variables. Psy- 
chologists who sample so as to secure proportionate representa- 
tion from the several occupational levels are, in effect, using the 
principle of stratification. It should be obvious that the method 
can be used only when information is available on some variable 
or variables which permits their use m setting up the strata, and 
when cases within the strata can be drawn randomly. 

When the sampling is for attributes by the stratified method, 
the standard error of an obtained proportion, P, is given, in terms 
of information yielded by the sample, approximately by 


<rp 


PQ 


1 N 

N 


( 111 ) 


where P equals the proportion in the total sample, N, who possess 



363 


Sampling Errors in Experimentation 

the attiibute, Q = 1 — P, and is the weighted variance of the 
several strata proportions about the sample value P A casual 
examination of foimula (111) shows that the magnitude of the 
eiror is loss for a stratified sample than for a random sample, and 
that the mciease m precision depends upon one’s ability to stratify 
the univeisc m such a wav as to secuie strata which aie really 
different uilh legaid to the attribute being studied 

For btiatified sampling, the variance of the mean may be 
wiitten as 

( 112 ) 

where X = the sample mean, = the sample variance, and 
= the weighted variance of the means of the seveial strata 
about the total sample mean If stratification has been accom- 
plished by use of a vaiiable, F, which is linearly related to the 
variable being studied, the foimula can be written m the form 

~ (o-^i: - o-\r^xv) (113) 

N 

It will be notaced that stratified sampling does lead to greater 
precision in the sense of smaller chance error, but only when the 
control or stratifying variable is related to the variable being 
studied. 

The quota method involves the use of strata, but selection 
within the strata is not done on a random basis — the field worker 
merely fills a quota by securing the correct proportion per strata; 
selective factors leading to bias can easily operate 

Area sampling. There is considerable evidence that area or 
“pm point” sampling is the best method yet devised for drawing 
samples in survey studies. Its use, however, depends on the 
availability of extensive facilities The student who is mteiested 
in this, or the stratified, method will wish to turn to detailed 
treatments of the subject.* 

SAMPLING ERRORS IN EXPERIMENTATION 

The formation of groups for experimental purposes can be 
accomplished (1) by random samplmg— the random assigning of 

* Yates, F., Sarrvphng methods for cenavsea and surveys. New York Hafner, 
1949, IDeming, W E , Some theory of samphr^g, New York Wiley, 1950 



364 


Remarks on Error Reduction 


individuals to the groups, (2) by pairing, (3) by using sibs or 
littermates, (4) by matching distributions, and (5) by using the 
same person under all the expenmental conditions. The last 
nokentioned will not be feasible when practice or fatigue effects are 
likely. 

For methods 2, 3, and 5 the statistical analysis is by way of the 
analysis of variance (mixed model) with rows standing for the 
matched persons or litters or mdividuals, respectively, for the 
3 methods. The F test of the significance of the differences among 
the correlated means (the means for conditions) involves an error 
term which is freed of the row variation; stated differently, the 
error term (an estimate of a 2-way interaction vaiiance) tends to 
be small if the correlations between the matched persons or be- 
tween sibs or between scores on the same persons are large. The 
foregomg argument holds, of course, for just 2 experimental 
groups (or an experimental and a control group) as well as for 
3 or more groups 

Thus, compared to method 1 (random assignment), greater 
precision is attainable by using method 2 or 3 or 5 Before dis- 
cussing method 4, let us again consider the situation where groups 
are needed for just 2 conditions. If the groups are formed by 
pairing individuals, the sampling variance of the difference be- 
tween the 2 means is, as we learned in Chapter 6, given by 

(J^D = - 2ri2<7riO'jP2 (256) 

The gain in pairmg, over random assignment, depends on the 
magnitude of ri 2 . It can be shown that if the pairing is done on 
the basis of variable Y, the value of ri 2 will be r^xyt and in case 
2 or more variables are controlled by pairing, ri 2 will be the 
square of the multiple correlation between the dependent vari- 
able, Xj and the control variables The reason for pairing, it will 
be recalled, is to make the groups comparable on certain variables 
which might affect the outcome of the expenment We now see 
explicitly that the advantage of pairmg depends definitely on 
how highly the variables, so controlled, are correlated with the 
dependent vanable No correlation, no gain; low correlation, 
little gain 

Method 4 is another way of making groups comparable on 
pertment variables. Instead of pairing persons, distributions are 
matched for the Y variable, to be controlled, in such a manner 



365 


Sampling Errors in Experimentation 

that the 2 groups contain the same proportions of cases in the 
seveial mteivals as hold for a supply distiibution on Y The 
sampling vaiiance of the difference between the 2 X means is 
given by 

■“ ^xy) (114) 

If the matching has been made on the basis of several control 
variables, the 2 coirelations (1 for each gioup) become the multi- 
ple r’s between X and the control vaiiables 

From (114) one may deduce the following fact: Where 2 groups 
have been separately matched as to distiibution on the same 
control vanable(s), the standard error of the difference can be ob- 
tained without the restriction of the ordinary pairing procedure, 
which requnes that there be an equal number of cases in the 
2 groups. The reader will note that either term in formula (114) 
IS, as might be expected, identical to formula (113) for the sam- 
pling variance of a mean when the stratified method is used. The 
method of matching distributions is particularly useful when the 
cost per case is much greater m the expenmental group than in 
the contiol group. Precision can be increased by taking a larger 
control group — a possibility also when the groups are chosen by 
randomization. 

The use of paired individuals for expenmental (and control) 
conditions has long been recognized as a sound procedure. One 
might argue, however, that the advantages of paiiing have been 
overstressed. The gain in error reduction may not be appre- 
ciable The advocates of pairing say that they are not willing to 
risk randomization as a method for setting up groups, but it 
should be noted that there are always numerous variables which 
might affect the outcome of an experiment that are never con- 
trolled except by randomization Thus one can seldom, if ever, 
completely avoid placing faith m the randomization process. 
Random differences between groups never have more than a ran- 
dom effect on the results; the error formulas always include all 
random effects. When pairing leads to only a slight reduction in 
error, wc have evidence that the pairing procedure may not have 
been worth the effort involved. 

It should be noted that an original group which is split into 
experimental groups either by the random method or by pairing 
must be regarded as representative of some defined umverse, and 



366 


Remarks on Error Reduction 


that such conclusions as are drawn from the experiment cannot 
be generalized unless it can be shown that the defined universe is 
representative of the generality of mankind with respect to the 
variables being studied In other words, those who use the college 
sophomore as a laboratory repiesentative of mankind have not 
avoided, by showing that selective factors did not render their 
experimental groups noncomparable, the necessity of bridging the 
gap bet'ween the sophomore’s behavior and that of the typical 
human being 

At this point, we remind the student that the covariance adjust- 
ment method (Chapter 17) is an entirely legitimate technique for 
allowing for uncontrolled variables and at the same time reducing 
eiror vanance 

It is appropriate to end this discussion (and the text) with an 
example of an expenment in which error reduction might have 
been achieved by judicious planning The Lanarkshire milk 
expel iment in England involved the daily feeding of three-fourths 
of a pint of raw milk to 5000 children and of an equal amount of 
pasteurized milk to another group of 5000 over a period of 4 
months. These 10,000, plus a control group of 10,000, \veie 
measured for height and weight at the beginning and the end of 
the 4-month penod Since the purpose of the experiment was 
to check on the relative merits of raw vs. pasteurized milk, the 
control group was nonessential. (It is an interesting commentary 
on the magic of the word “control” that very frequently a con- 
trol group IS used when not needed.) Despite large numbers, the 
feeder and control groups were not comparable as regards initial 
height and weight, the operating selective factor bemg the benevo- 
lent attitude of school teachers who apparently thought the re- 
search would not be harmed if preference was given frail, 
undernourished children in choosmg individuals for the feeder 
groups Either a carefully supervised random, or a definite 
pairing, procedure would have avoided this selective bias, but 
what is more important and more relevant to our present topic is 
the claim in a paper f by “Student,” so far not refuted, that the 
use of 50 pairs of identical twms would have yielded as precise 
information at only 2 per cent of the cost of the original experi- 
ment, or at a savmg of approximately 35,000 prewar dollars 

t “Student,” The Lanarkshire iniJk experiment, BiOTnetnka, 1931, 23, 
398r^6 



EXERCISE MATERIAL FOR ELEMENTARY STATISTICS* 


Chapter 2 

1 a Make separate frequency distnbutions for the maiks of the two groups 
of students in Table I Use mteivals of size 5 
b Deteimine also the cumulative fiequencies for each group 


Table I Final Examination Marks fob a Class in Statistics 


Students with 
No Calculus 


Students ^ith 
Some Calculus 




(N 

= 36) 



^37 

(N 

= 22) 


103 

150 

139 

79 

'l50 

^^34 


^12 

>430 

9S 

79 

94 

137 

‘118 

>13 

*">51 

•SL24 

•80 

■^b53 

106 

93 

106 

^137 

‘-Ol 

^09 

"ISl 

M)4 

"'96 

'-77 

‘ 71 

'101 

‘92 

T4 

'106 

'87 

-133 


>01 

-115 

108 

113 

103 

108 

"IrU 

105 



S54 

->32 

120 

95 

'‘83 

93 

I 

o 

'97 

*-411 

*>35 




2 a Make separate frequency distributions for the two groups of scores in 
Table II Use intervals of size 3. 
b Determine also the cumulative frequencies for each group 

Table II Scores on Final Examination for a Course on Psychological 

Tests 


Undergraduates (N = 32) Graduate Students {N = 23) 


70 

72 

76 

66 

76 

80 

84 

80 

90 

82 

84 

67 

69 

90 

60 

76 

47 

79 

62 

77 

89 

70 

51 

58 

71 

88 

65 

54 

73 

74 

87 

76 

79 

89 

64 

80 

67 

71 

90 

85 

95 

78 

69 

97 

91 

71 

63 

81 

87 

81 

78 

86 

92 




79 79 


3. a Draw a frequency polygon for the distnbution in Table III, part A. 

6. Draw an ogive for the data of Table III, part A. 

4. a. Draw a frequency polygon for the distnbution of Table III, part B. 
b Draw an ogive for the data of Table III, part B 

* These exercises are so arranged that, in general, each even-numbered exer- 
cise IS of the same type as its immediately preceding odd-numbered exercise 

367 



368 


Exercises 


Chapter 3 

5. For the scores of Table I, compute separately for the two groups 
o. the medians, using the undistnbuted scores 

m 6 the medians, using the frequency distnbutions. 

6. Repeat exercise (5) with the data of Table IL 

7. Compute the mean for each group in Table I by 
a, the defimtion formula for the mean. 

b the arbitraiy oiigin method. 

8. Repeat exercise (7) with the data of Table II. 

9. Combine the two distributions for the data of Table I, compute the mean 
by the aibitrary oiigm method, and check by using the formula for secur- 
ing the mean for a combined group (use the means obtained by the aibi- 
trary ongin method for this check) 

10. Repeat exercise (9) with the data of Table II 

Table IIL Distributions op IQ’s, Form L op Revised Stanpord-Binet 

Scale 



A, Ages2|-5§ 

B Ages 6-13 

IQ 

/ 

cuf 

/ 

cuf 

170-179 



1 

1623 

160-169 

4 

728 

1 

1622 

150-159 

4 

724 

3 

1621 

140-149 

11 

720 

29 

1618 

130-139 

41 

709 

73 

1589 

120-129 

82 

668 

140 

1516 

110-119 

175 

586 

308 

1376 

100-109 

193 

411 

407 

1068 

90-99 

107 

218 

335 

661 

80-89 

76 

111 

215 

326 

70-79 

20 

35 

76 

111 

60-69 

7 

15 

30 

35 

50-59 

5 

8 

4 

5 

40-49 

2 

3 

1 

1 

30-39 

1 

1 




JV = 728 -V = 1623 


11 Suppose the following intervals for the heights of trees, measured to the 
nearest inch: 250-274, 275-299; 300-324 

What is the midpomt of the 250-274 interval^ 

What is the lower limit of the 275-299 interval? 

"What is the upper limit of the 275-299 interval? 

12 Given the following intervals for length, to the nearest inch: 42-44; 45-47; 
48-50 

What is the midpoint of the 42-44 interval? 



Exercises 


369 


What is the lower limit of the 48-50 interval^ 

What IS the upper limit of the 45-47 interval? 

13 Specify the midpoint values for the bottom interval of each of the fol- 
lowing: 


Age at Last Weight to Size of College 
Birthday Neaiest Pound Classes 


30-34 144-147 15-19 

25-29 140-143 10-14 


14 Give the midpoints of the first of the following intervals 
o. 45-49; 50-54 (age at last birthday) 
h, 45-49; 50-54 (size of college classes) 
c. 45-49, 50-54 (length to nearest centimeter) 

15. What arbitrary rule would you follow in determining lower limits and 
midpoints of intervals for the data of Tables I and II? 

16. As is commonly known, an IQ is obtained by multiplying the quotient, 
MA/CA, by 100 and rounding to nearest intcgei. Now CA is taken as 
age to the nearest month, whereas an MA of, say, 88 months means at 
least 88 months Consideung these facts, vrhat would be the exact lower 
limit for the bottom mteival of Table III? 

17 Compute the median, Qi, and Qs for the distribution m Table III, part A 

18. Compute the median, Qi, and Qs for the distnbution in Table III, pait B. 

19. Compute the 10th and 90th percentile points for the distribution in Table 
III, part A, 

20. Compute the 20th and the 80th percentile points for the distnbution in 
Table III, part B. 

21 Using the results of exercises (17) and (19), locate the five points, Qi, the 
median, Qs, Pw, and P 90 , on the base line of your ogiive curve for the dis- 
tnbution of Table III, part A Divide the ordinate on the nght-hand 
side (the oidmate at IQ = 170) into approximate fourths Draw a hne 
from each of the five base-line points up to the ogive, then horizontally 
to the right. Notice where these horizontal hnes hit the ordinate on the 
right-hand side 

22. Using the results of exercises (18) and (20), locate the five points, Qi, the 
median, Q$, P 20 , and Pjjo, on the base line of your ogzve curve for the dis- 
tnbution of Table III, part B Divide the ordinate on the right-hand 
side (the ordinate at IQ » 180) into approximate fourths Draw a line 
fiom each of the five base-line points up to the ogive, then hoiizontally to 
the nght Note where these horizontals hit the ordinate on the right- 
hand side 

23 a. Compute the B7)*s (or^s) for the two groups m Table I (use arbitrary 

ongm method) 

5. Combine the two distnbutions in Table I, compute the SD, then see 
whether the obtained SD agrees with that secured by using formula (8). 

24 Repeat exercise (23) with the data of Table II 

25. For the distnbution of IQ’s in Table 111, part A, the mean is 106 68 and 
the standard deviation is 17 41. 



370 


Exercises 


а. Determine the two points defined by M ± o-. 

б. Determine the two points defined by M ± 20- 

0 . Locate these four points, also the mean, on the base hne of your fre- 
quency polygon for the data of Table III, part A, Erect ordinates 
from each of these five base-hne points to the polygon, and study the 
resulting picture 

d. Deterimne approximately the percentage of cases between M ± <r, also 
between M =b 2<r 

26. The distribution of IQ^s in Table III, part B, has a mean of 103 34 and 
a 0 - of 16 88 Repeat exercise (25), using the values and polygon for the 
data of Table III, part B 

27. Suppose the mean score on a statistics quiz is 35, the median is 36, the 
SD IS 6, and the quartile deviation is 4 

a. If to each personas score we added 50 points, what values would we 
then get for the mean, the median, the SD, and the quartile deviation? 
b If we doubled each person's score, what would be the values of the new 
mean and SDl 

28. Given that the distnbution of scores on a quiz leads to a mean of 40, a 
median of 38, an SD of 9, and a quartile deviation of 6 

a If we added 10 points to the scores of each student, what would be the 
values for M, Mdn, SD, and 

h If all scores were halved, what would be the values of the mean and 
the SD7 

29. For each of the following three sets of two groups, deterimne the mean 
for the two groups combined. 

a ATi = 40, If 1 =* 28, 

N2 = 40, ikr2 = 23. 
b = 100, Ml = 44; 

N2 = 60, M2 = 60. 
c i\ri = 12,489, Ml = 228 63, 

JV2 * 6971, Ma = 228 63 

30. Given that the mean weekly pay of the seven working members of the 
Jones family is $55 and the median is $50 (both after deductions). 

a What is the weekly “take home” of the family? 
b Suppose that Daddy Jones, already the best paid, receives an increase 
which after deductions amounts to $6 a week What is the new mean? 
What IS the new median? 

31. If an SD is 9 when computed from a frequency distnbution with intervals 
of Size 6, what would you expect it to be if computed by using the defi- 
nition formula for BD? 

32 How large is the grouping error in an SD of 13 computed from a distri- 
bution with intervals of size 12? 



Exercises 


371 


Chapter 4 

33 Assume that the IQ’s for a large number of unselected elementary school 
children are distributed as a normal curve with a mean of 100 and an 
SD of 17. 

a The first quartile point will be near what value? 
b The percentage with IQ’s above 130 will be? 
c The middle 80 per cent will fall between what values? 
d The 99th percentile will be near what IQ value? 
e The percentage with IQ’s below 70 will be? 

34 Let us presume that the Army General Classification Test yields a nor- 
mal distribution of scores, with mean of 100 and SD of 20 

a The value of the third quartile will be near what score? 
h The first percentile pomt will be at what scoie? 

c. Between a score of 70 and a score of 130 will be found what percentage 
of the cases? 

d. The middle 60 per cent of scores will fall between what score values? 
e The value of the quartile deviation will be what? 

35 One way to comprehend the meaning of either sizable or small difierences 
between groups is to consider the extent to which the distributions over- 
lap Given the following data for weights of college students* 

Men M = 142, <r = 15; Women* M = 120, cr = 12 

Assummg normality for both distributions, how many men per thousand 
are hghter than the average woman? Determine the number of women 
per thousand who are heavier than the average man 

36. If the mean height for college men is 68 5 mches and the SD is 2 8, and 
if the mean height for college women is 64 5 and the SD is 2.5, what pro- 
portion of women exceed the average man in height? What proportion 
of men fall below the average height for women? 

37. If the IQ’s of children of professional people average 116, with a o- of 12, 
what percentage of such children would you expect to fall below 100, the 
general average (assume normahty)? 

38 Using the data of exercise (36), determine the percentage of men who are 
more than 6 feet tall, and the percentage of women who are shorter than 
5 feet. 

39 Suppose that the distribution of numerical grades m a course is normal 
with a mean of 60 and an SD of 10 The instructor wishes to assign letter 
grades as follows* 15 per cent A’s, 35 per cent B’s, 35 per cent C’s, and 
15 per cent D’s Determine to the nearest score the dividing line between 
the A’s and B’s, between the B’s and C’s, and between the C’s and D’s. 

40. Suppose that it has been decided to use a five-letter grading system, 
A, B, C, D, and E, and that it is required that the letters shaU correspond 
to ''equal” distances on the base hne, the whole of which is taken to be 
six sigmas. Assuming nonnahty, what percentage would be assigned A’s; 
B’s; C’s? 



372 Exercises 

41 Determine the height of the unit normal curve at the point which is 1 2 
fai gTYin. umts below the median, at the third quartile point; at the point 
which IS two Q's below the median 

42 What is the height of the ordinate of the umt normal curve corresponding 
r to the xjc value that cuts off the upper 10 per cent of the curved The 

lower 25 pei cent’ The upper 2 5 per cent’ 

43 Frequently, one must be able to translate percentile scores to standard 
scores and vice versa (assume normahty) 

a What are the standard scores (to the nearest tenth) which correspond 
to the following percentiles 67th, 44th, 20th, 99th’ 

6 What are the percentile equivalents (to nearest value) of the following 
standard scores +1 04, —1 34, —1 75, +2 06’ 

44 Suppose a typical bell-shaped distnbution 

a l^at is the approximate percentile value of the following points the 
mean, Qs, the point which is one SD above the mean, the first decile 
point’ 

b What IS the percentile value of an IQ of 140? 80’ [See exercise (26) 
for needed mean and SD ] 

45 What is the sigma distance between the following (assume normality) 
a the 10th and the 90th percentile points’ 

5. the 25th and the 75th percentile points’ 
c the 1st and the 99th percentile points’ 

46 If a distribution of scores is noimal, what is the sigma distance between 
the 10th and 20th percentile points’ between the 40th and 50th percentile 
pomts? 

47 Given that a reading test for unselected 10-year-olds yields a mean of 50 
and an SD of 10, while an arithmetic test gives a mean of 48 and an SD 
of 8 If Joe Bloke scores 52 on reading and 50 on arithmetic, is he better 
in reading than in arithmetic’ Why’ 

48 If a student’s reading late scoie falls at the 20th percentile, and his stand- 
ard score on reading comprehension is —14, would you conclude that his 
comprehension w^as superior to his rate’ Why’ 

49 If the scores of a positively skewed distnbution are converted to percentile 
scores, what will be the form of the distnbution of percentile scores’ 

50. If the scores of a skewed distribution were conveited to standard scores, 
what would be the form of the resulting distribution of standard scores’ 

Chapter 5 

51 If you tossed four unbiased penmes 160 times, how often would you expect 
to have two heads and two tails’ 

52 Suppose you roll a pair of fair dice once. What is the probability that 
exactly eleven spots will turn up? 

53. Suppose that you are rolling two fair dice, one red and the other white 
What is the probabihty of obtaimng a three spot on the red die and a 
four spot on the white one’ 



Exercises 


373 


54 In that back-alley game known as “crap shooting,” the obtaining of spots 
on the 2 dice totalling 7 seems to be of pai amount importance at 
certam times What is the probability of rolling a 7 (assume gentlemen's 
dice)? 

55 Suppose that we have 3 pyramidal objects (perfectly homogeneous) 
which can be rolled like dice. The sides of each aie numbeied 1, 2, 3,^4, 
and success is defbtied as the gettmg of 4’s on the down sides Determine 
the probability for obtaining exactly three 4's, exactly two 4's, exactly 
one 4, and no 4's What is the probabihty of securing at least two 4's? 

56 If you were dealt 1 card from each of 5 well-shuffled decks, what is the 
probability of all 5 cards bemg spades? 

57. The probability of drawing a red card from an ordmary (and well-shuffled) 
deck IS f and the probability of di awing a heart is f . Why isn't ^ plus J 
the probability of drawing either a heart or a red card, or is it? 

58 Suppose that for a class of 100 the number of A's given on the first qmz 
is 15 and that the number of A's on the second quiz is also 15. Suppose 
further that the names of the students are placed on slips which aie then 
well nuxed in a hat We might say that the probability is 15 that a name 
drawm from the hat w’lll be that of a student who leceived an A on the 
first quiz, likewise, the second qmz Wliy might it be erroneous to say 
that the piobability is 15 times 15 that the diawn name belongs to a 
student who made A's on both quizzes? 

59 Suppose a true-false quiz of 6 questions, what is the piobability of secur- 
ing a perfect score on a chance (or pure guessing) basis? 

60 Some folks have pointed out that the blmdfold test of the ability to dis- 
tinguish brands of cigarettes is befuddled with guessing Suppose that 
you are to test a fiiend who claims that he can tell the diffeience between 
Luckies and Camels At 5-mmute intervals you piesent him with a 
cigarette, either a Camel or a Lucky accoidmg to the flip of a com, until 
he has tned 8 cigarettes What is the probability that he would by chance 
alone name all 8 cigarettes correctly? 

61 a Toss 6 coins 64 times, for each toss tally the number of heads that 

turn up, thereby obtammg a frequency distribution with an iV of 64 
Label this Senes A Toss the corns 64 more times, and label the resultr 
ing distribution as Senes B Then combine the 2 distnbutions 
h Using the binomial expansion, ascertain the expected distribution when 
6 coins are tossed 64 times, 128 times 
c Compute the mean and standard deviation for each of your 3 dis- 
tributions, also for the expected distnbution (round to 2 decimals) 
d Determine the proportion of times that 3 heads, also 6 heads, turned up 
m each senes, and in the combined senes Compare these results with 
the expected proportions. 

e. Subtract the mean of Series A from that of Senes B (keep sign if nega- 
tive) For the proportion of times 3 heads turned up, subtract the 
Senes A proportion from that for Senes B (keep sign) 

/. Bnng all the results to class so that frequency distributions may be 
made for M% cr's, proportions, and differences between M's and between 
proportions 



374 


Exercises 


62. Do exercise (61), using 7 coins 

63 If 42 of 60 rats turn to the right at the first choice point in a maze, would 
you conclude that rats, m general, prefer to turn to the nght at this choice 
point? First get your answer by using the appropriate standard erroi, 
then check by using the bmomial e\pansion and normal cuive approxi- 
mation thereto 

64 If at a particular time 50 per cent of all eligible voters favor the Demo- 
crats, how often would polls based upon random samples of size 400 yield 
percentages of 55 or over as favormg Democrats? 

65 Items on an mtelligence test of the Bmet type are at times assigned an 
index of difficulty which is nothmg more than the percentage passmg the 
item Given the following for an item of 100 12-year-olds, 60 per cent 
passed, of 100 13-year-olds, 80 per cent passed When possible sampling 
errors are considered, would you conclude from these 2 difficulty indices 
that the item is really more difficult for 12-year-olds? State the signifi- 
cance level associated with your conclusion 

66 If a political issue is favored by 55 per cent of a sample of 200 Republi- 
cans, and by 46 per cent of a sample of 250 Democrats, would you con- 

Tdble IV Data for Passing (P) and Failing (F) Items on the 
Stanfoed-Binet Test 

4-year-olds 5-year-olds 


Item Item Item Item 


Case 

i a 

h 

Case 

1 a 

5 

Case 

a 

b 

Case 

a 

b 

'-I 

F 


*^1 



MEl 

f 

¥ 

tei 

¥ 


v2 



W22 

HP 



¥ 


^62 



v^3 


‘P 

U23 

HP 


i-43 


>F 

V63 

VP 


'4 



U24 



144 

¥ 

¥ 

134 

S' 

pU- 

v6 


HP 


4p 



¥ 

¥ 

v%5 

¥ 

Ft 

V6 


HP 

\^6 

¥ 

]*>^ 

i46 

¥ 

¥ 

t66 



>7 


HP 

m 

HP 



¥ 

¥ 

437 

¥ 




W 



I*- 

1^48 

¥ 

¥ 

*68 

¥ 

pu 

v9 


HP 

V£9 


plx" 

740 

¥ 

¥ 

69 

¥ 

Pi 

ViO 


tiP 


fip 

E*- 

/50 

¥ 

¥ 

bfo 


F'- 

vll 

¥ 

HP 

tai 

HP 


toi 

¥ 

¥ 

On 

¥ 

pU 

yi2 

HP 

tep 


HP 


|52 

¥ 

¥ 

02 

¥ 

E*^ 

/13 


HP 

\98 

yp 

E^ 

«53 

¥ 

¥ 

CO 

«P 

E*^ 

i>14 

HP 

If 


HP 

E^ 


IP 

*P 

t74 

HP 

E*^ 


HP 

or 


HP 



¥ 

or 

iJf5 

HP 

fu 

tl6 

HP 




E^ 


IP 

¥ 




V17 

ip 

¥ 

v«r 

HP 


lS7 

r 

¥ 

W7 

•P 

9^ 

Mt8 


Hp 


W 


HS8 

*p 

¥ 

tj78 

•IP 

]K 

VW 


ir 

(3^ 

Iff 



¥ 

wr 

U9 

HP 

]K 

VW 






u80 


¥ 

1^0 

¥ 

Ft^ 



Exercises 375 

elude that the populations of Republicans and Democrats differ on the 
issued 

67 a Given the data m Table IV, do items a and 6 dififei significantly in 

difl&culty for the 4-year-olds^ Ditto, the 5-year-olds‘'^ 
b Is theie a sigmficant diffeience between 4r and 6-year-olds on item o? 
On item h** 

68 a Would you conclude fiom the data of Table V that items c and d differ 

significantly in difficulty for the fi-year-olds*? Ditto, the T-year-olds**^ 
h Would you conclude from the data of Table V that, in general, 7-year- 
olds aie more successful than 6-year-olds on item c’ On item d? 

Table V Passing (P) and Failing (F) Information on Two 
Binet Test Items at Two Age Levels 
6-year-olds 7-year-olds 


Item Item Item Item 


Case 

c 

d 

Case 

c 

d 

Case 

c 

d 

Case 

c 

d 

1 

F 

F 

21 

P 

P 

41 

P 

F 

61 

P 

F 

2 

P 

P 

22 

P 

F 

42 

P 

P 

62 

F 

F 

3 

F 

P 

23 

F 

F 

43 

F 

P 

63 

P 

F 

4 

F 

F 

24 

F 

F 

44 

P 

P 

64 

F 

F 

5 

F 

F 

25 

F 

F 

45 

F 

P 

65 

P 

P 

6 

F 

F 

26 

P 

P 

46 

F 

F 

66 

P 

P 

7 

P 

F 

27 

P 

F 

47 

P 

P 

67 

P 

P 

8 

F 

F 

28 

P 

F 

48 

F 

P 

68 

P 

P 

9 

P 

F 

29 

F 

F 

49 

P 

P 

69 

P 

P 

10 

F 

F 

30 

F 

F 

50 

P 

F 

70 

P 

P 

11 

P 

F 

31 

F 

F 

51 

F 

F 

71 

P 

P 

12 

P 

P 

32 

F 

F 

52 

F 

F 

72 

P 

P 

13 

F 

F 

33 

P 

F 

53 

F 

F 

73 

P 

F 

14 

P 

F 

34 

P 

F 

54 

P 

P 

74 

P 

F 

15 

F 

F 

35 

F 

F 

55 

F 

F 

75 

F 

F 

16 

P 

P 

36 

F 

F 

56 

P 

P 

76 

P 

P 

17 

F 

F 

37 

F 

P 

57 

F 

F 

77 

P 

F 

18 

F 

F 

38 

P 

F 

58 

P 

P 

78 

P 

F 

19 

F 

F 

39 

F 

F 

59 

P 

P 

79 

P 

F 

20 

F 

F 

40 

P 

P 

60 

F 

F 

80 

P 

P 


Chapter 6 

69 If you tossed 5 corns 100 times and obtained 2 8 as the mean number of 
heads, would you suspect bias in the coins? Why^ (Be specific ) 

70 If you tossed 4 coins 100 times and obtained 2.4 as the mean number of 
heads, would you suspect bias in the coins^ Why? (Be specific ) 



Exercises 


376 

71 For a sample of 2970 cases, ages 2 5 to 18, the distiibutioa of IQ's on 
Form L of the 1937 Stanford-Binet yields; 

Mean = 104 00 Skewness (gi) — 028 

SD == 17 03 Kuitosis (§ 2 ) = 346 

In answenng the following questions, indicate the steps in your computa- 
tions 

a Would you conclude that the mean IQ of the population for these ages 
IS 100 (the value expected for a piopeily constiucted IQ test)^ 
h Is it reasonable to believe that the IQ distribution foi the population, 
at these ages, has normal skewness^ 

c Would you conclude fiom the sample kuitosis that the kurtosis for the 
population diffeis fiom normal kuitosis? 

72 Suppose that the mean IQ for the geneial population is 100 and the stand- 
ard deviation is 17 If a sample of 289 cases were diawn at landom, w'hat 
would be the probabihty of obtaining a mean as gieat as 101? As low 
as 98? 

73 Suppose it IS known that the standaid deviation of scoies for a popula- 
tion is 20 How many cases would you need to draw in older that the 
standard eiror of 

a a sample mean be 2 score points? 
b a sample SD be 3 points? 

74 Suppose that you are polling on an issue for which opinion seems about 
equally divided How many cases (how laige an N) would you need to 
be sure (at the 01 level of sigmficance) that a sample deviation of 3 per 
cent from 50 per cent is nonchance? 

75 One of the lequirements of a good IQ test is that the mean IQ for un- 
selected cases of any school age gioup shall be 100, and that the distn- 
butions for the several age gioups shall have the same standard devia- 
tions Given the following for the 1937 Stanford-Binet Test 


Age 

6 

12 

N 

203 

202 

M 

101.0 

103 6 

SD 

12 5 

20 0 


a Is it reasonable to believe that the test is yielding the desired mean 
when used with 12-year-olds? 

6 Would you judge from the results for these 2 age groups that the re- 
quirement of equal variability has been met? 

76 The means and standard deviations for 2 groups of twins on spool pack- 
ing are as follows 

Fraternals Identicals 


N 

92 

94 

M 

761 

741 

SD 

79 

66 


Do these groups differ significantly in mean performance? In variability? 



Exercises 


377 


77 Two forms of a test, to be comparable, should yield similar means and 
similar standard deviations when given to a group For 202 cases of age 7, 
we have the following data for the 1937 Stanford-Binet. 



Form L 

Form M 

M 

1018 

103 5 

SD 

16 2 

15 6 


In order to balance practice effect, one-half the group was tested on 
Form L, then on Foim M, vrhile the reverse order was used for the other 
half The correlation between the 2 sets of IQ’s was 93 Is the ob- 
tained difference between means larger than one would expect on the basis 
of chance sampling*? Ditto, the difference between the 

78 Measurements on 1000 of each sex at birth have been reported in the 
literature The mean length of boys Cm centimeteis) was 50 51 and the 
SD was 2 99, and the values for the giils were 49 90 and 3 00. Is there 
evidence here foi sex diffeience m length at birth*? 

79. a Given that the data for Group A in Table VI were collected to evaluate 
the effect of piactice fiom a first to a second admimstration of the 
same test Is there evidence for a sigmficant mciease in performance^ 

Table VI First (Xi) and Second (X2) Test Scores for Two Groups 


Xi 

X2 

Xi 

Group A 

Xi Xi 

X 2 

Xi 

X 2 

32 

31 

43 

40 

26 

24 

35 

36 

34 

37 

37 

44 

29 

32 

42 

41 

16 

20 

52 

57 

40 

44 

34 

36 

33 

33 

43 

45 

38 

40 

28 

29 

30 

32 

31 

36 

45 

46 

36 

34 

35 

35 

41 

44 

29 

29 

27 

29 

20 

19 

36 

37 

50 

52 

48 

48 

27 

31 

39 

39 

23 

29 

35 

41 

34 

34 

27 

20 

37 

38 

28 

31 

Xi 

X 2 


Gioup B 

Xi Xi 

^2 

Xi 

X 2 

32 

31 

53 

65 

25 

32 

31 

34 

41 

45 

34 

37 

30 

30 

21 

27 

19 

29 

30 

35 

41 

44 

39 

41 

46 

50 

28 

33 

37 

33 

37 

38 

40 

37 

44 

42 

36 

34 

47 

57 

24 

35 

33 

35 

33 

39 

29 

34 

33 

41 

35 

40 

51 

67 

42 

50 

37 

39 

38 

42 

20 

20 

36 

39 



378 


Exercises 


6. Given that between the 2 testings of Group Table VI, special coach- 
ing was provided on how to take the test Would you judge that the 
coaching of Group B led to an increase m scores which is significantly 
larger than one would expect on the basis of the practice effect demon- 
strated in the data for Group 

^0. Given that the expenmental group of Table VII was provided with an 
experience which should have produced a shift in scores, -whereas the con- 
trol group was exposed only to those ordinary expenences which were 
presumably ahke for both groups The experimentals and controls were 
paired on the basis of the pretest scores — case 1 with case 101, case 2 
with 102, . . case 36 with 136. Did the provided experience have a 
demonstrable effect^ 

Table VII. Before (Xi) and After (X2) Measures on Two Groups 
Matched by Pairing 


Expenmental Group 


Case 

Xi 

X 2 

Case 

Xi 

Z 2 

Case 

Xi 

X 2 

Case 

Xi 

X 2 

1 

66 

76 

10 

75 

81 

19 

62 

64 

28 

64 

73 

2 

78 

88 

11 

57 

64 

20 

78 

95 

29 

71 

70 

3 

62 

63 

12 

68 

69 

21 

60 

60 

30 

58 

63 

4 

52 

56 

13 

51 

50 

22 

54 

53 

31 

70 

76 

5 

37 

34 

14 

61 

77 

23 

59 

62 

32 

75 

84 

6 

65 

69 

15 

70 

85 

24 

66 

81 

33 

70 

75 

7 

60 

66 

16 

76 

86 

25 

72 

83 

34 

67 

78 

8 

85 

93 

17 

63 

64 

26 

86 

96 

35 

57 

64 

9 

64 

66 

18 

46 

49 

27 

49 

46 

36 

55 

63 


Control Group 


Case 

Xi 

X 2 

Case 

Xi 

X 2 

Case 

Xi 

X 2 

Case 

Xi 

X 2 

101 

67 

71 

110 

74 

77 

119 

61 

63 

128 

64 

67 

102 

77 

84 

111 

56 

57 

120 

80 

88 

129 

71 

67 

103 

62 

64 

112 

68 

66 

121 

61 

63 

130 

58 

59 

104 

53 

51 

113 

50 

43 

122 

54 

53 

131 

70 

76 

105 

41 

‘37 

114 

61 

72 

123 

58 

59 

132 

75 

85 

106 

65 

68 

115 

70 

74 

124 

66 

79 

133 

68 

72 

107 

59 

61 

116 

76 

82 

125 

73 

79 

134 

67 

75 

108 

83 

92 

117 

63 

66 

126 

90 

98 

135 

57 

60 

109 

65 

57 

118 

45 

49 

127 

47 

44 

136 

55 

55 



Exercises 


379 


Chapters 8 and 9 

81. a Using the data of Table VIII, make a scatter diagram with “Ex” on 

the V axis, intervals of size 5, and with “TMT” on the x axis, with 
z = 3 and the first interval taken as 105-107 (interval sizes are su^ 
gested m order to facihtate an exact check of the tallymg and subse- 
quent computations) 

b From the scatter diagram, compute the correlation between “Ex” and 
“TMT”; also compute th e 2 mean s and the 2 standard deviations 
c jjjjpje the regression eq uation fo r predictmg “Ex” fiom “TMT*^ 
the regregg ion hne on your scatter dia^am ' L ^ 

d De¥erinine"the error of estimate for predictmg “Ex” from a knowledge 
of “TMT ”yL 

e What percentage of the variance m “iiix" is due to or associated with 
variation m “TMT”^ 

82. Do exercise (81) with “CM” substituted for “TMT” (an appropriate m- 
terval size for “CM” is rather obvious). 

Table VIII. Data for 38 Students in a Course on Mental Tests 

(“Ex” stands for final examination scores; “TMT” stands for IQ's based on 

Terman-McNemar Test of Mental Abihty, “CM” stands for scores on the 
Terman Concept Mastery Test.) 


Ex 

TMT 

CM 

Ex 

TMT 

CM 

Ex 

TMT 

CM 

62 

123 

47 

106 

125 

126 

54 

128 

69 

107 

129 

50 

79 

109 

33 

86 

132 

82 

87 

131 

78 

84 

120 

56 

02 

114 

41 

95 

129 

74 

100 

129 

81 

67 

113 

44 

100 

122 

52 

78 

112 

51 

102 

141 

112 

87 

136 

127 " 

-00 

132 

110 

79 

132 

72 

87 

125 

74 

85 

126 

54 

02 

126 

54 

64 

121 

46 

-58 

111 

33 

06 

131 

111 

89 

131 

97 

110 

138 

110 

Sf 

131 

65 

-58 

128 

71 

115 

131 

138 

75 

109 

22 

84 

123 

28 

68 

129 

39 

93 

131 

108 

80 

127 

53 

78 

123 

67 

67 

106 

25 

82 

120 

53 

SO 

136 

101 





83 Using the data of Table IX, compute a “rehabihty coefficient” for Stanford- 
Bmet IQ’s, and determine the standard error of measurement. (Although 
the data in Table IX are in general typical of those used m determining 
test reliability by the form vs form method, they are definitely atypical 
as regards the Stanford-Bmet Test The inquisitive student may wish 
to read Chapter 6 in Q McNemar’s The revision of the Stanford-Binet 
ScaUj Boston Houghton Miffiin, 1942 ) 



380 


Exercises 


ro&fe IX. IQ Scores for 100 Children Who Were Tested on Both 
Form L and Form M of the 1937 Stanpord-Binbt Test 


L 

M 

L 

M 

L 

M 

L 

M 

89 

85 

93 

86 

95 

93 

88 

99 

88 

94 

70 

74 

63 

65 

88 

89 

90 

89 

123 

121 

104 

108 

77 

71 

98 

96 

93 

86 

124 

125 

99 

98 

110 

107 

88 

105 

89 

89 

77 

82 

118 

121 

76 

72 

no 

98 

94 

98 

117 

118 

98 

100 

no 

no 

96 

93 

113 

110 

116 

115 

122 

118 

70 

70 

119 

119 

108 

112 

126 

125 

106 

106 

102 

110 

113 

119 

113 

122 

87 

91 

99 

93 

125 

121 

99 

102 

122 

115 

100 

104 

96 

89 

104 

93 

63 

61 

89 

81 

135 

134 

120 

118 

113 

105 

98 

100 

83 

92 

111 

112 

64 

64 

118 

112 

112 

115 

91 

88 

96 

94 

95 

103 

92 

91 

116 

114 

107 

100 

133 

133 

77 

82 

85 

87 

98 

91 

93 

93 

74 

66 

86 

83 

94 

93 

116 

102 

83 

76 

113 

107 

87 

96 

143 

147 

85 

83 

99 

99 

104 

no 

94 

98 

94 

90 

103 

102 

82 

78 

93 

96 

86 

82 

127 

126 

100 

102 

124 

130 

123 

124 

no 

117 

78 

78 

124 

119 

107 

112 

138 

140 

97 

95 

129 

127 

138 

124 

93 

96 

107 

no 



APPENDIX 

Tables A to G 



Appendix 


Table A Normal Curve Functions 


« or x/» 

^^a. mtoz 

.00 

00000 

.05 

01994 

.10 

03983 

.15 

05962 

.20 

.07926 

.25 

.09871 

.30 

.11791 

.35 

13683 

.40 

15542 

.45 

17364 

.50 

19146 

.55 

20884 

.60 

,22575 

.65 

.24215 

.70 

.25804 

.75 

.27337 

.80 

.28814 

.85 

30234 

.90 

.31594 

.95 

.32894 

1.00 

.34134 

1 05 

.35314 

1.10 

.36433 

1.15 

.37493 

1.20 

.38493 

1 25 

.39435 

1 30 

.40320 

1 35 

.41149 

1.40 

.41924 

1 45 

42647 


Area* g Smaller y or Ordinate 


.50000 

.3989 

48006 

.3984 

46017 

.3970 

.44038 

.3945 

.42074 

.3910 

40129 

.3867 

38209 

3814 

.36317 

.3752 

34458 

.3683 

32636 

3605 

30854 

.3521 

29116 

.3429 

27425 

3332 

25758 

.3230 

24196 

3123 

22663 

.3011 

.21186 

.2897 

.19766 

.2780 

.18406 

.2661 

.17106 

2541 

15866 

.2420 

14686 

2299 

13567 

2179 

12507 

.2059 

11507 

.1942 

.10565 

.1826 

.09680 

.1714 

08851 

.1604 

.08076 

1497 

.07353 

.1394 



Appendix 


383 


Table A, Nobmal Cubve Functions {Caniinued) 


z or x/a 

Area* m to 2 

1 50 

.43319 

1 55 

.43943 

1.60 

.44520 

1 65 

.45053 

1.70 

.45543 

1.75 

.45994 

1.80 

.46407 

1.85 

.46784 

1.90 

.47128 

1.95 

.47441 

2.00 

.47725 

2.05 

.47982 

2.10 

.48214 

2.15 

.48422 

2.20 

.48610 

2 25 

.48778 

2 30 

.48928 

2.35 

.49061 

2.40 

.49180 

2.45 

.49286 

2 50 

.49379 

2 55 

.49461 

2.60 

.49534 

2 65 

.49598 

2 70 

.49653 

2.75 

.49702 

2.80 

.49744 

2.85 

.49781 

2.90 

.49813 

2.95 

.49841 

3 00 

.49865 

3.25 

.49942 

3.50 

.49977 

3.75 

.49991 

4.00 

.49997 


Area, q Smaller y or Ordinate 


.06681 

1295 

.06057 

1200 

05480 

.1109 

.04947 

.1023 

.04457 

0940 

.04006 

.0863 

.03593 

.0790 

03216 

.0721 

02872 

0656 

.02559 

0596 

02275 

.0540 

.02018 

.0488 

01786 

.0440 

01578 

.0396 

01390 

.0355 

01222 

.0317 

.01072 

.0283 

.00939 

0252 

00820 

.0224 

.00714 

.0198 

.00621 

0175 

00539 

.0154 

00466 

.0136 

00402 

0119 

00347 

0104 

00298 

0091 

00256 

.0079 

00219 

0069 

.00187 

0060 

.00159 

0051 

00135 

.0044 

00058 

0020 

.00023 

0009 

.00009 

0004 

.00003 

.0001 



384 


Appendix 

Table B . Tbansformation of r to z 


r 

z 

r 

z 

r 

z 

.01 

.010 

.34 

.354 

67 

.811 

02 

020 

.35 

.366 

68 

829 

.03 

.030 

36 

.377 

69 

848 

04 

040 

.37 

.389 

70 

.867 

.05 

050 

.38 

.400 

.71 

.887 

.06 

060 

.39 

412 

72 

908 

07 

.070 

.40 

.424 

73 

.929 

.08 

.080 

.41 

.436 

74 

.950 

.09 

.090 

.42 

.448 

75 

.973 

.10 

.100 

.43 

.460 

76 

996 

.11 

.110 

.44 

.472 

77 

1 020 

.12 

121 

.45 

.485 

78 

1.045 

.13 

.131 

.46 

.497 

79 

1 071 

.14 

141 

.47 

.510 

80 

1.099 

.15 

.151 

.48 

.523 

.81 

1.127 

.16 

161 

.49 

.536 

.82 

1 157 

.17 

172 

.50 

.549 

83 

1 188 

.18 

.181 

.51 

.563 

84 

1 221 

.19 

192 

.52 

.577 

.85 

1 256 

,20 

.203 

.53 

.590 

86 

1 293 

.21 

.214 

.54 

.604 

87 

1 333 

22 

.224 

.55 

.618 

88 

1.376 

.23 

234 

.56 

.633 

89 

1 422 

24 

.245 

.57 

.648 

.90 

1 472 

,25 

.256 

.58 

.663 

.91 

1.528 

,26 

266 

.59 

.678 

.92 

1.589 

,27 

277 

.60 

.693 

.93 

1 658 

28 

288 

.61 

.709 

.94 

1.738 

,29 

299 

.62 

.725 

.95 

1 832 

.30 

309 

.63 

.741 

.96 

1 946 

.31 

321 

.64 

.758 

97 

2 092 

.32 

332 

.65 

.775 

98 

2 298 

.33 

.343 

.66 

.793 

.99 

2 647 



Appendix 

Tdble C. Transformation op z to r 


385 


! 00 .01 .02 .03 .04 05 .06 .07 08 .09 

0 0000 .0100 0200 .0300 .0400 .0500 .0599 .0699 .0798 .0898 

1 .0997 .1096 1194 .1293 .1391 .1489 .1586 .1684 .1781 .18 J 7 


2 

1974 

.2070 

2165 

2260 

.2355 

.2449 

.2543 

.2636 

.2729 

.2821 

3 

2913 

3004 

3095 

3185 

.3275 

.3364 

.3452 

3540 

.3627 

3714 

4 

3800 

3885 

3969 

4053 

4136 

.4219 

.4301 

4382 

.4462 

4542 

5 

.4621 

.4699 

4777 

4854 

.4930 

.5005 

.5080 

5154 

.5227 

5299 

6 

5370 

.5441 

5511 

5580 

.5649 

.5717 

.5784 

5850 

.5915 

5980 

7 

6044 

6107 

.6169 

.6231 

.6291 

.6351 

.6411 

6469 

.6527 

6584 

.8 

.6640 

6696 

.6751 

6805 

.6858 

.6911 

.6963 

7014 

.7064 

7114 

9 

.7163 

.7211 

7259 

.7306 

.7352 

.7398 

.7443 

.7487 

.7531 

.7574 

10 

.7616 

7658 

7699 

.7739 

.7779 

.7818 

.7857 

7895 

.7932 

7969 

1 1 

.8005 

8041 

.8076 

8110 

.8144 

.8178 

.8210 

8243 

.8275 

8306 

12 

.8337 

8367 

8397 

8426 

8455 

.8483 

.8511 

8538 

.8565 

.8591 

13 

.8617 

8643 

.8668 

8692 

.8717 

.8741 

.8764 

8787 

.8810 

8832 

14 

.8854 

8875 

8896 

.8917 

.8937 

.8957 

.8977 

8996 

.9015 

.9033 

1.5 

.9051 

9069 

9087 

9104 

.9121 

.9138 

.9154 

9170 

9186 

9201 

1.6 

.9217 

.9232 

.9246 

9261 

9275 

9289 

.9302 

9316 

9329 

9341 

1.7 

9354 

9366 

9379 

9391 

.9402 

9414 

.9425 

9436 

9447 

.9458 

1.8 

9468 

9478 

9488 

.9498 

9508 

9518 

.9527 

9536 

.9545 

.9554 

1.9 

.9562 

9571 

.9579 

9687 

.9595 

9603 

.9611 

9618 

.9626 

.9633 

20 

.9640 

9647 

.9654 

.9661 

9668 

9674 

9680 

9686 

9693 

.9699 

21 

9704 

9710 

.9716 

.9722 

9727 

9732 

.9738 

9743 

9748 

.9753 

22 

9757 

.9762 

.9767 

.9771 

.9776 

9780 

9785 

.9789 

9793 

.9797 

23 

9801 

.9805 

.9809 

.9812 

.9816 

9820 

.9823 

9827 

9830 

9834 

24 

.9837 

.9840 

.9843 

.9846 

.9849 

9852 

.9855 

9858 

.9861 

.9864 

25 

.9866 

9869 

.9871 

.9874 

.9876 

9879 

9881 

.9884 

9886 

.9888 

2.6 

9890 

9892 

.9894 

.9897 

9899 

9901 

.9903 

.9904 

.9906 

.9908 

27 

9910 

.9912 

.9914 

.9915 

9917 

9919 

.9920 

.9922 

9923 

.9925 

28 

9926 

9928 

.9929 

.9931 

9932 

9933 

9935 

.9936 

9937 

.9938 

29 

.9940 

9941 

.9942 

.9943 

9944 

9945 

.9946 

9948 

9948 

.9950 


* Table C is abridged from Table VII of Fisher and Yates Statisiical tables 
for biological, agricultural and medical research, Ohver and Boyd, Ltd , Edin- 
burgh, by permission of the authors and publishers. 



386 


Appendix 


Table D. Distbibution op x® 


n 

99 

.98 

95 

90 

.80 

.70 

.50 

1 

.00016 

.00063 

.0039 

016 

.064 

.15 

.46 

9 

.02 

.04 

10 

.21 

45 

.71 

1 39 

3 

12 

.18 

35 

58 

1.00 

1.42 

2 37 

4 


43 

71 

1.06 

1.65 

2 20 

3 36 

5 

.55 

.75 

1 14 

1 61 

2.34 

3 00 

4.35 

6 

87 

1.13 

1 64 

2 20 

3.07 

3 83 

5 35 

7 

1 24 

1 56 

2 17 

2 83 

3.82 

4 67 

6 35 

8 

1 65 

2 03 

2 73 

3 49 

4 59 

5 53 

7 34 

9 

2 09 

2 53 

3 32 

4.17 

5.38 

6.39 

8 34 

10 

2 53 

3 06 

3 94 

4.86 

6.18 

7.27 

9.34 

11 

3 05 

3 61 

4.58 

5 58 

6.99 

8 15 

10.34 

12 

3 57 

4 18 

5 23 

6 30 

7.81 

9 03 

11 34 

13 

4 11 

4 76 

5 89 

7 04 

8.63 

9 93 

12 34 

14 

4 06 

5 37 

6 57 

7 79 

9.47 

10.82 

13 34 

15 

5 23 

5 98 

7.26 

8 55 

10.31 

11 72 

14.34 

16 

5 81 

6 61 

7 96 

9 31 

11.15 

12 62 

15.34 

17 

6 41 

7 26 

8 67 

10 08 

12.00 

13 53 

16.34 

18 

7 02 

7 91 

9 39 

10 86 

12.86 

14 44 

17 34 

19 

7 63 

8 57 

10 12 

11 65 

13.72 

15 35 

18 34 

20 

8 26 

9 24 

10 85 

12 44 

14-58 

16 27 

19.34 

21 

8 90 

9 92 

11 59 

13 24 

15.44 

17 18 

20 34 

22 

9 54 

10 60 

12 34 

14 04 

16.31 

18 10 

21 34 

23 

10 20 

11 29 

13 09 

14 85 

17.19 

19 02 

22 34 

24 

10 86 

11 99 

13 85 

15.66 

18.06 

19.94 

23 34 

25 

11 52 

12 70 

14 61 

16 47 

18.94 

20 87 

24.34 

26 

12 20 

13 41 

15 38 

17 29 

19.82 

21 79 

25.34 

27 

12 88 

14 12 

16 15 

18 11 

20.70 

22 72 

26 34 

28 

13 56 

14 85 

16 93 

18 94 

21-59 

23 65 

27 34 

29 

14 26 

15 57 

17 71 

19 77 

22.48 

24 58 

28.34 

30 

14 95 

16 31 

18 49 

20 60 

23.36 

25 51 

29 34 


* Table D is abndged from Table lY of Fisher and Yates: Stattehcal tables 
for biological, agricultural and medical research, Ohver and Boyd, Ltd., 
Edinburgh, by permission of the authors and publishers. 



cJISwKt- ocpooMO oirf^wtoi-* 


Appendix 


387 


n 


Table D Distribxttion op * — (Conttnued) 


.30 

1.07 

2.41 

3.66 

4.88 

6.06 

7.23 

8.38 
9.52 
10 66 

11.78 

12.90 

14.01 
15.12 
16.22 

17.32 

18.42 
19.51 
20 60 
21.69 

22.78 

23.86 

24.94 

26.02 
27.10 

28.17 

29.25 

30.32 
31-39 
32.46 
33.53 


20 

1 64 

3 22 

4 64 

5 99 

7 29 

8 56 

9 80 

11 03 

12 24 

13 44 

14 63 

15 81 

16 98 

18 15 

19 31 

20 46 

21.62 

22.76 

23.90 

25.04 

26.17 

27.30 

28.43 

29.55 

30.68 

31.80 

32.91 

34.03 

35.14 

36.25 


10 

2 71 
4 60 

6 25 

7 78 
9 24 

10.64 
12 02 
13.36 

14.68 
15 99 

17.28 

18.55 

19.81 

21.06 
22 31 

23.54 

24.77 

25.99 

27.20 
28 41 

29.62 

30.81 
32.01 

33.20 

34.38 

35.56 
36.74 

37.92 

39.09 

40.26 


.05 

3.84 

5.99 

7.82 

9.49 

11.07 

12 59 

14.07 
15.51 

16.92 

18.31 

19 68 

21.03 
22 36 

23.68 
25.00 

26.30 

27.59 

28.87 

30.14 

31.41 

32.67 

33.92 
35 17 

36.42 
37.65 

38.88 
40.11 

41 34 

42 56 

43.77 


02 

5.41 

7.82 

9.84 

11.67 

13.39 

15 03 

16.62 

18.17 

19.68 
21 16 

22 62 

24.05 

25 47 

26 87 
28 26 

29 63 

31 00 

32 35 

33 69 
35.02 

36 34 

37 66 
38.97 

40.27 

41.57 

42.86 

44.14 

45.42 
46 69 
47.96 


.01 

6 64 
9 21 
11 34 
13 28 

15.09 

16 81 
18 48 

20.09 
21 67 

23.21 

24.72 

26.22 
27 69 

29.14 
30 58 

32.00 
33 41 
34.80 
36.19 

37.57 

38.93 

40.29 

41.64 
42.98 

44.31 

45.64 
46.96 

48.28 
49.59 

50.89 


.001 

10 83 
13.^2 
16 27 
18 46 
20 52 

22 46 
24 32 
26 12 

27.88 
29 59 

31 26 

32 91 
34.53 

36 12 

37 70 

39 25 

40 79 

42.31 
43 82 

45.32 

46 80 
48.27 
49.73 

51.18 

52.62 

54.05 

55.48 

56.89 

58.30 
59.70 


* Table D is abndged from Table IV of fisher and Yates: Statistical tables 
for biological, agncuUural and medical research, Oliver and Boyd, Ltd, 
Edinburgh, by permission of the authors and publishers. 



388 


Appendix 


Table E Distribution of i * 



11 

.05 

.02 

01 

.001 

1 

6 314 

12 706 

31 821 

63 657 

636 619 

2 

2 920 

4 303 

6 965 

9 925 

31.598 

3 

2 353 

3 182 

4 541 

5.841 

12 941 

4 

2 132 

2 776 

3 747 

4 604 

8 610 

5 

2 015 

2 571 

3 365 

4 032 

6 859 

6 

1.943 

2 447 

3 143 

3 707 

5 959 

7 

1.895 

2 365 

2 998 

3 499 

5 405 

8 

1 860 

2 306 

2 896 

3 355 

5 041 

9 

1 833 

2 262 

2 821 

3 250 

4.781 

10 

1 812 

2 228 

2.764 

3 169 

4.587 

11 

1 796 

2 201 

2.718 

3 106 

4.437 

12 

1 782 

2 179 

2 681 

3 055 

4.318 

13 

1 771 

2 160 

2 650 

3 012 

4.221 

14 

1 761 

2 145 

2 624 

2 977 

4 140 

15 

1 753 

2 131 

2 602 

2 947 

4.073 

16 

1 746 

2 120 

2 583 

2 921 

4 015 

17 

1 740 

2 no 

2 567 

2 898 

3.965 

18 

1 734 

2 101 

2 552 

2 878 

3.922 

19 

1 729 

2 093 

2.639 

2 861 

3.883 

20 

1 725 

2 086 

2 528 

2 845 

3.850 

21 

1 721 

2 080 

2 518 

2 831 

3.819 

22 

1 717 

2 074 

2 508 

2 819 

3.792 

23 

1.714 

2 069 

2 500 

2 807 

3.767 

24 

1 711 

2 064 

2 492 

2 797 

3.745 

25 

1 708 

2 060 

2 485 

2 787 

3.725 

26 

1 706 

2 056 

2 479 

2 779 

3.707 

27 

1 703 

2 052 

2 473 

2 771 

3.690 

28 

1 701 

2.048 

2 467 

2 763 

3.674 

29 

1 699 

2 045 

2 462 

2.756 

3 659 

30 

1.697 

2 042 

2.457 

2 760 

3.646 

40 

1 684 

2.021 

2 423 

2 704 

3.551 

60 

1 671 

2 000 

2 390 

2 660 

3.460 

120 

1.658 

1.980 

2 358 

2 617 

3.373 

00 

1.645 

1.960 

2 326 

2 576 

3.291 


* Table E is abndged from Table III of Fisher and Yates Staitsitcal tables 
for btologtcai, OLgricuUnral and medical researck, Ohver and Boyd, Ltd, 
Edinburgh, by permission of the authors and publishers 



Appendb: 


389 


Table F, Table op F fob 05 (roman), .01 {italic), and .001 (bold face) 
Levels op Signipicancb * 



1 

2 

3 

4 

5 

6 

8 

12 

24 

oo 


161 

200 

216 

225 

230 

234 

239 

244 

249 

254 

1 

40Jd 

4999 

540s 

B6£6 

B7£4 

6859 

5981 

6106 

6£84 

6366 


405284 

500000 

540379 

562500 

576405 

585937 

598144 

610667 

623497 

686619 


18 51 

19 00 

19 16 

19 25 

19 30 

19 33 

19 37 

10 41 

19.45 

19 50 

2 


99 01 

99 17 

99 £5 

99 sn 

99 33 

99 S6 

99 4£ 

99 46 

99 60 


998.5 

999.0 

999.2 

999.2 

999.3 

999.3 

999 4 

999.4 

999.5 

999 5 


10 13 

9 55 

9 28 

9 12 

9 01 

8 94 

8 84 

8 74 

8 64 

8 53 

3 

S4 

SO 81 

£9 46 

£8 71 

£8 £4 

£7 91 

87 49 

87 06 

£6 60 

£6 1£ 


167.5 

148.5 

141.1 

137.1 

134.6 

132.8 

130.6 

128.3 

125.9 

123.5 


7 71 

6 94 

6 59 

6 39 

6 26 

6 16 

6 04 

5 91 

5 77 

5 63 

4 

£1 £0 

18 00 

16 69 

IS 98 

IS B£ 

IS £1 

I4.BO 

14 37 

13 9S 

IS 46 


74.14 

61.25 

56 18 

53.44 

51.71 

50.53 

49.00 

47.41 

45.77 

44.05 


6 61 

5 79 

5 41 

5 19 

5 05 

4 95 

4 82 

4 68 

4 53 

4 36 

5 

le £6 

IS £7 

1£ 06 

11 39 

10 97 

10 67 

10 87 

9 89 

9,47 

9,0£ 


47.04 

36.61 

83.20 

81.09 

29.75 

28.84 

27.64 

26.42 

25.14 

23.78 


5 99 

5 14 

4 76 

4 53 

4 39 

4 28 

4 15 

4 00 

3 84 

3 67 

6 

IS 74 

10 9£ 

9 78 

9 16 

8 75 

8 47 

8 10 

7 78 

7 31 

6 88 


35.51 

27.00 

23.70 

21.90 

20.81 

20.03 

19.03 

17.99 

16.89 

15.75 


5 59 

4 74 

4 35 

4 12 

3 97 

3 87 

3 73 

3 57 

3 41 

3.23 

7 

t£ £5 

9 65 

8 4S 

7 85 

7 46 

7 19 

6 84 

6 47 

6,07 

6 66 


29.22 

21.69 

18.77 

17.19 

16.21 

15.52 

14.63 

13.71 

12.73 

11.69 


5 32 

4 46 

4 07 

3 84 

3 69 

3 58 

3 44 

3 28 

3 12 

2 93 

8 

n £6 

8 65 

7 69 

7 01 

6 63 

6 37 

6 03 

6 67 

6 £8 

4 86 


25.42 

18.49 

15.83 

14.39 

13.49 

12.86 

12.04 

11.19 

10.30 

9.34 


5 12 

4 26 

3 86 

3 63 

3 48 

3 37 

3 23 

3 07 

2 90 

2 71 

9 

10 66 

8 0£ 

6 99 

6 4X 

6 06 

5 80 

6 47 

6^11 

4-7S 

4 31 


22.86 

16.39 

13.90 

12.56 

11.71 

11.18 

10.37 

9.57 

8.72 

7.81 


4 96 

4 10 

3 71 

3 48 

3 33 

3 22 

3 07 

2 91 

2 74 

2 54 

10 

10 04 

7 56 

6 66 

6 99 

6 64 

5 S9 

6 06 

4 71 

4 SS 

S 91 


21.04 

14.91 

12.55 

11.28 

10.48 

9.92 

9.20 

8.45 

7.64 

6.76 


4 84 

3 98 

3 59 

3 36 

3 20 

3 09 

2 95 

2 79 

2 61 

2 40 

11 

6 66 

7 £0 

6 ££ 

6 67 

6 S£ 

B 07 

4 74 

4 40 

4 0£ 

3 60 


19.69 

18.81 

11.56 

10.85 

9.58 

9 05 

8.35 

7.63 

6.85 

6.00 


4 75 

3 88 

3 49 

3 26 

3 11 

3 00 

2 85 

2 69 

2 50 

2 30 

12 

0 SS 

6 9$ 

6 96 

5,41 

6 06 

4 8£ 

4 60 

4 16 

S.78 

s se 


18.64 

12.97 

10.80 

9.68 

8.89 

8.38 

7.71 

7.00 

6.25 

5.42 


♦ Table F is repnnted, in rearranged form, from Table V of Pisher and Yates Stattstual 
taileB for hidLooical, aonefuiUural and modtcal reaearch, Obver and Boyd, Ltd., Ekiinburgb, by 
permission of the authors and publishers 




390 

Table F, Tabus (roman), 01 (itahc), and .001 (bold face) 

NincANCE * — {CorUiniied) 


n2\ 

1 

2 

3 

4 

5 

6 

8 

12 

24 

eo 


4 67 

3.80 

3 41 

3 18 

3.02 

2 92 

2 77 

2 60 

2 42 

2 21 

13 

9 m 

6 70 

5 74 

6,20 

4,86 

4 62 

4 50 

3 96 

3 69 

3 13 


17.81 

12.31 

10.21 

9.07 

8.35 

7.86 

7.21 

6.52 

5.78 

4 97 


4 60 

3 74 

3 34 

3 11 

2 96 

2 85 

2 70 

2 53 

2 35 

2 13 

14 

8,86 

6 61 

5 56 

6 03 

4 69 

4 45 

4 f4 

5 80 

3 45 

3 00 


17.14 

11.78 

9.73 

8 62 

7.92 

7.43 

6.80 

6.13 

5 41 

4.60 


4 54 

3 68 

3 29 

3 06 

2 90 

2 79 

2 64 

2 48 

2 29 

2 07 

16 

8 68 

6 se 

5 

4 89 

4^56 

4 52 

4 00 

3 67 

3 30 

3 37 


16.59 

11.34 

9.34 

8.25 

7.57 

7 09 

6 47 

5.81 

5.10 

4.31 


4 49 

3 63 

3 24 

3 01 

2 85 

2 74 

2 50 

2 42 

2 24 

2 01 

16 

8 53 

6 23 

5 29 

4 77 

4.44 

4 20 

5 80 

3 33 

3 13 

3 73 


16.18 

10.97 

9.00 

7.94 

7.27 

6.81 

6.19 

5.55 

4 85 

4.06 


4.45 

3 69 

3 20 

2 96 

2 81 

2 70 

2 55 

2 38 

2 19 

1 96 

17 

8,40 

e 11 

5 18 

4 67 

4.54 

4 10 

5 79 

3 45 

3 08 

3 33 


15.72 

10.66 

8.73 

7.68 

7.02 

6.56 

5.96 

5.32 

4.63 

3 85 


4.41 

3.55 

3.16 

2 93 

2 77 

2 66 

2 51 

2 34 

2 15 

1 92 

18 

8,28 

e 01 

6 09 

4 58 

4 25 

4 01 

3 W 

3 37 

3 00 

3 67 


15.38 

10.39 

8.49 

7.46 

6.81 

6.35 

5.76 

5.13 

4.45 

3.67 


4.38 

3 52 

3 13 

2 90 

2.74 

2 63 

2 48 

2.31 

2 11 

1 88 

19 

8,18 

6 93 

5 01 

4 60 

4 17 

5^4 

3 33 

3 30 

3 03 

3 40 


15.08 

10.16 

8.28 

7.26 

6.61 

6 18 

5.59 

4.97 

4.29 

8.52 


4 35 

3 49 

3 10 

2 87 

2 71 

2 60 

2 45 

2 28 

2 08 

1 84 

20 

8 10 

6 85 

4H 

4 4S 

4 10 

S 87 

3 33 

3 35 

3 33 

3 43 


14.82 

9.95 

8.10 

7.10 

6.46 

6.02 

5.44 

4.82 

4.15 

8.38 


4.32 

3.47 

3 07 

2 84 

2 68 

2 57 

2 42 

2 25 

2 05 

1 81 

21 

8,02 

5,78 

4.S7 

4 87 

4 04 

5 81 

3 31 

3 17 

3 30 

3 86 


14.59 

9,77 

7.94 

6.95 

6.82 

5.88 

5.31 

4.70 

4.08 

8.26 


4.30 

3.44 

3 05 

2 82 

2 66 

2 55 

2 40 

2 23 

2 03 

1 78 

22 

7,94 

5,72 

4 82 

4 81 

8 99 

5 76 

3 43 

3 13 

3 73 

3 31 


14.38 

9.61 

7.80 

6.81 

6.19 

5 76 

5.19 

4.58 

8.92 

3.15 


4.28 

3 42 

3.03 

2 80 

2 64 

2 53 

2 38 

2 20 

2 00 

1 76 

23 

7,88 

5,66 

4 76 

4 26 

8 94 

3 71 

3 4f 

3 07 

3 70 

3 26 


14.19 

9.47 

7.67 

6.69 

6.08 

5.65 

5.09 

4.48 

8.82 

8.05 


4.26 

3.40 

3 01 

2 78 

2 62 

2 51 

2 36 

2 18 

1 98 

1 73 

24 

7 82 

5,61 

4 72 

4 22 

8 90 

3 67 

3 33 

3 03 

3 66 

3 31 


14.08 

9.34 

7.55 

6.59 

5 98 

5 55 

4.99 

4.89 

3.74 

2.97 


* Table F is reprinted, m rearranged form, from Table V of Fisber Yates: StctiiaUeal 
taUes for htologteal, aarunitiural and medical rsseoreA, Oliver and Boyd, Ltd., Edmbuight 
by penmsBion of the authors and publishers. 



Appendix 


391 


Table F, Table or F for .05 (roman), .01 {itahc), and .001 (bold face) 
Levels op Significance ♦ — (Continued) 




2 

3 

4 

5 

6 

8 

12 

24 

eo 


4 24 

3 38 

2 99 

2 76 

2 60 

2 49 

2 34 

2 16 

1 96 

1 71 

25 

7 77 

6 57 

4 68 

4 18 

5 86 

8 68 

8 SB 

8 99 

8 58 

8 17 


13.88 

9.22 

7.45 

6.49 

5.88 

5.46 

4 91 

4.31 

8.66 

2.89 


4 22 

3 37 

2 98 

2 74 

2 59 

2 47 

2 32 

2 15 

1 95 

1 69 

26 

7 

6 6S 

4 64 

4 14 

8 88 

8 55 

8 89 

8 95 

8 55 

8 IS 


13.74 

9.12 

7.36 

6.41 

6.80 

5.38 

4.83 

4.24 

3.59 

2.82 


4 21 

3 35 

2 96 

2 73 

2 57 

2 46 

2 30 

2 13 

1 93 

1 67 

27 

7 68 

6 49 

4 60 

11 

8 78 

8 55 

8 85 

8 98 

8 55 

8 10 


13.61 

9.02 

7.27 

6.33 

5 73 

5.31 

4.76 

4.17 

3.52 

2 75 


4 20 

3.34 

2 95 

2 71 

2 56 

2 44 

2.29 

2 12 

1 91 

1 65 

28 

7 64 

6 46 

4.67 

4 07 

8 75 

8 58 

8 88 

8 90 

8 58 

8 05 


13.50 

8.93 

7.19 

6.25 

5 66 

5.24 

4.69 

4.11 

8.46 

2.70 


4 18 

3 33 

2 93 

2 70 

2 54 

2 43 

2 28 

2 10 

1 90 

1 64 

29 

7 60 

6 49 

4 64 

4 04 

8 78 

8 50 

8 80 

8 57 

8 49 

8 0$ 


13 39 

8.80 

7.12 

6.19 

5.59 

5.18 

4.64 

4.05 

8.41 

2.64 


4 17 

3 32 

2 92 

2 69 

2 53 

2 42 

2 27 

2 09 

1.89 

1.62 

30 

7 66 

6 S9 

4 51 

4 OB 

8 70 

8 4r 

8 17 

8 5^ 

8 47 

8 01 


13.29 

8.77 

7.05 

6.12 

8.63 

5.12 

4.08 

4.00 

8.36 

2.69 


4 08 

3 23 

2 84 

2 61 

2 45 

2 34 

2 18 

2 00 

1 79 

1 51 

40 

7 SI 

6 18 

4 SI 

5 8S 

8 51 

8 29 

8 99 

8 55 

8 89 

1.50 


12.61 

8.25 

6.60 

5.70 

5.13 

4.73 

4.21 

8.64 

8.01 

2.23 


4 00 

3 15 

2 76 

2 52 

2 37 

2.25 

2 10 

1.92 

1 70 

1 39 

60 

7 08 

4 98 

4 IS 

S 65 

8 84 

8 18 

8 88 

8 50 

8.18 

1.50 


11.97 

7.76 

6.17 

0.81 

4.76 

4.37 

8.87 

8.81 

2.69 

1.90 


3 92 

3 07 

2 68 

2 45 

2 29 

2 17 

2 02 

1 83 

1.61 

1 25 

120 

6 86 

4 79 

5 55 

S 45 

8 17 

8 96 

8 55 

8 84 

1.95 

1 S5 


11.38 

7.31 

0.79 

4.95 

4.42 

4.04 

8.65 

8.08 

2.40 

1.56 


3 84 

2 99 

2 60 

2 37 

2 21 

2.09 

1.94 

1 75 

1 52 

1 00 


6 64 

4 60 

8 78 

S,S8 

8 08 

8 80 

8 51 

8 15 

1 79 

1.00 


10.83 

6.91 

5.42 

4.62 

4.10 

8.74 

8.87 

2.74 

2.13 

1.00 


* Table P is reprinted, m rearranged form, from Table V of Fisher and Yates Stat%titcdL 
tables for btotogical, aoncultural and medteed research j Oliver and BoycL Ltd., Edmburgh, 
by persuasion of the authors and publishers. ” 





Toible G Squabbs 


R 1 

TSP 


ViON 

1.00 

10000 

1.00000 

3 16228 

101 

1.02 

1.03 

1.0201 

1 0404 
1.0609 

1 00499 
1.00995 
1.01489 

3 17805 

3 19374 

3 20936 

lr04 

105 

1.06 

1 0816 
1.1025 
1.1236 

1.01980 

1 02470 
1.02956 

3 22490 

3 24037 

3 25576 

1.07 

1.08 
109 

1 1449 

1 1664 

1 1881 

1.03441 

1 03923 

1 04403 

3 27109 
3.28634 
3.30151 

1.10 

1.2100 

1 04881 

3.31662 

1.11 

1.12 

1.13 

1.2321 

1 2544 
1.2769 

1.05357 

1.05830 

1 06301 

3 33167 
3.34664 
3.36155 

1.14 

1.15 

1.16 

1.2996 

1.3225 

1 3456 

1.06771 

1 07238 
1.07703 

3.37639 

3.39116 

3.40588 

1.17 

1.18 
1.19 

1 3689 

1 3924 
14161 

1.08167 

1.08628 

1.09087 

3.42053 

3.43511 

3 44964 

1.20 

14400 

1.09545 

3 46410 

121 

1.22 

1.23 

1 4641 
14884 
15129 

1 10000 

1 10454 
1.10905 

3 47851 
3.49285 
3.50714 

124 

1.26 

126 

1.5376 

1 5625 
1.5876 

1 11355 
1.11803 
1.12250 

3 52136 

3 53553 
3.54965 

1.27 

128 

129 

1 6129 

1 6384 

1 6641 

1.12694 

1 13157 

1 13578 

3 56371 

3 57771 

3 59166 

1.30 

1 6900 

1 14018 

3.60555 

131 

132 
1.33 

17161 

1.7424 

1.7689 

1 14455 

1 14891 

1 15326 

3 61939 

3 63318 

3 64692 

134 

1.35 

1.36 

1.7956 

1 8225 
18496 j 

1 15758 

1 16190 

1 16619 

3 66060 

3 67423 

3 68782 

1.37 

1.38 

1.39 

1,8769 

1 9044 
1.9321 

1 17047 

1 17473 

1 17898 

3 70135 
3.71484 

3 72827 

1.40 

1.9600 

1 18322 

3.74166 

141 

142 
1.43 

1 9881 • 

2 0164 

2 0449 

1 18743 

1 19164 
1.19583 

3 75500 

3 76829 
3.78153 

1.44 

1.45 

1.46 

2 0736 
2.1025 
2.1316 

1 20000 

1 20416 

1 20830 

3 79473 

3 80789 
3.82099 

147 

1.48 

149 

2.1609 

2 1904 
2.2201 

121244 

1 21655 

1 22066 

3 83406 

3 84708 
3.86005 

1.50 

2.2500 

1.22474 

3 87298 

N 

Wa 

Vn 

VioN 


AND Square Roots 


N 

N2 

“vT" 

vim 

1.50 

2 2500 

1 22474 

3 87298 

1 51 

1.52 

1.53 

2 2801 

2 3104 

2 3409 

1.22882 

1 23288 
123693 

3 88587 

3 89872 
3.91162 

154 

1.55 

1 56 

2 3716 

2 4025 
2.4336 

1 24097 

1 24499 
1.24900 

3.92428 

3 93700 
3.94968 

1.57 

1.58 

1.59 

2 4649 

2 4964 

2 5281 

1 25300 

1 25698 
126095 

3 96232 
3.97492 

3 98748 

1.60 

2 5600 

1 26491 

4 00000 

161 

1.62 

1.63 

2 5921 

2 6244 
2.6569 

1 26886 

1 27279 
1.27671 

4 01248 

4 02492 

4 03735 

1.64 

165 

1.66 

2 6896 

2 7225 

2 7556 

1.28062 

1.28452 

1 28841 

4 04969 

4 06202 

4 07431 

1.67 

168 

169 

2 7889 

2 8224 

2 8561 

1.29228 

1 29615 

1 30000 

4 08656 

4 09878 

4 11096 

1.70 

2.8900 

130384 

4.12311 

171 

172 

173 

2 9241 

2 9584 
2.9929 

1.30767 

1.31149 

1.31529 

4 13521 

4 14729 

4 15933 

174 

175 
1.76 

3 0276 

3 0625 

3 0976 

1.31909 

1.52288 

1.52665 

4 17133 

4 18330 
4.19524 

1 77 

1 78 
179 

31329 

3 1684 

3 2041 

1.53041 

1 33417 
1.33791 

4.20714 

4 21900 

4 23084 

180 

3 2400 

1 54164 

4.24264 

181 

182 

1.83 

3.2761 

3 3124 
3.3489 

1.34536 

1.34907 

1.55277 

4.25441 

4.26615 

4.27785 

184 

1 85 
1.86 

3 3856 

3 4225 

3 4596 

1 35647 

1 36015 
1.36382 

4 28952 

4 30116 
4.31277 

187 

1.88 

1 89 

3 4969 

3 5544 
3.5721 

1.36748 

1.37113 

1.37477 

4.32435 

4 33590 
4.34741 

1.90 

3 6100 

1.37840 

4.35890 

191 

1.92 

1.93 

3 6481 

3 6864 

3 7249 

1 38203 
1.38564 
1.38924 

4.37035 

4.38178 

4.39318 

1.94 

195 

1.96 

3 7636 
3.8025 
3.8416 

1.39284 

1.39642 

1.40000 

4.40454 

4.41588 

4.42719 

1.97 

1.98 

1.99 

3 8809 
3.9204 

3 9601 

1.40357 

1.40712 

1.41067 

4.43847 

4.44972 

4 46094 

2.00 

4.0000 

1.41421 

4 47214 

N 

N* 

vn 

VioN 


392 




TdbU G Squares and Square Roots — (CcrUinued) 


N 

W* 

VN 

VlON 

2.00 

4 0000 

1 41421 

4 47214 

2.01 

4.0401 

1 41774 

4 48330 

2 02 

4 0804 

1.42127 

4 49444 

2 03 

4.1209 

1.42478 

4 50555 

204 

4.1616 

1 42829 

4 51664 

2 05 

4 2025 

1.43178 

4 52769 

2.06 

4.2436 

1.43527 

4 53872 

2 07 

4 2849 

1,43875 

4 54973 

2 08 

4 3264 

1 44222 

4 56070 

2 09 

4.3681 

1 44568 

4 57165 

2.10 

4.4100 

1.44914 

4 58258 

2.11 

4.4521 

1.45258 

4 59347 

2.12 

4.4944 

1 45602 

4 60435 

2.13 

4.5369 

1.45945 

4.61519 

214 

4 5796 

1 46287 

4 62601 

2 15 

4 6225 

146629 

4 63681 

2.16 

4 6656 

1.46969 

4 64758 

2 17 

4 7089 

1 47309 

4 65833 

218 

4 7524 

1.47648 

4 66905 

2.19 

4.7961 

1.47986 

4 67974 

2.20 

4 8400 

1.48324 

4 69042 

2 21 

4 8841 

1.48661 

4 70106 

2 22 

4.9284 

1.48997 

4 71169 

2.23 

4 9729 

1.49332 

4.72229 

2.24 

6 0176 

149666 

4.73286 

2 25 

5 0625 

150000 

4.74342 

2.26 

5.1076 

1 50333 

4.75395 

2 27 

5.1529 

1.50665 

4 76445 

2.28 

5.1984 

1.50997 

4 77493 

2.29 

5 2441 

1 51527 

4.78559 

2.30 

5.2900 

1 51658 

4.79583 

231 

5.3361 

1.51987 

4 80625 

2 32 

5 3824 

1 52315 

4 81664 

2.33 

5.4289 

1.52643 

4.82701 

2 34 

5.4756 

1.52971 

4 83735 

2.35 

5.5225 

153297 

4 84768 

2.36 

5.5696 

1.53623 

4,85798 

2 37 

5.6169 

1.53948 

4 86826 

2 38 

5 6644 

1.54272 

4 87852 

2 39 

5.7121 

1.54596 

4 88876 

2.40 

5 7600 

1.54919 

4 89898 

241 

5 8081 

1 55242 

4 90918 

2.42 

5 8564 

1 55563 

4 91935 

243 

5.9049 

1.55885 

4 92950 

244 

5 9536 

1.56205 

4 93964 

2 45 

6.0025 

1 56525 

4.94975 

2.46 

6.0516 

1.56844 

4.95984 

2.47 

6.1009 

1 57162 

4.96991 

2.48 

6.1504 

1.57480 

4.97996 

2.49 

6 2001 

1 57797 

4.98999 

2.50 

6 2500 

1.58114 

5 00000 

K 

IT* 

VN 

VioN 


R 

N2 

VN 

VlON 

2.50 

6.2500 

1 58114 

5.00000 

2.51 

2 52 
2.53 

63001 

6 3504 
64009 

1 58430 

1 58745 
159060 

5.00999 

5 01996 
5.02991 

254 

2.55 

2 56 

6 4516 

6 5025 

6 5536 

1 59374 

1 59687 

1 60000 

5.03^4 

5 04975 
5.05964 

2 57 

2.58 

2.59 

6 6049 

6 6564 

6 7081 

1 60312 
160624 

1 60935 

5.06952 

5 07937 

5 08920 

2.60 

6 7600 

161245 

5.09902 

2 61 

2 62 
2.63 

6 8121 

6 8644 

6 9169 

1 61555 
161864 
162173 

5.10882 

5.11859 

5.12835 

2.64 

2 65 
2.66 

6 9696 

7 0225 

7 0756 

1 62481 

1 62788 
1.63095 

5.13809 

5.14782 

5 15752 

2.67 

2 68 
2.69 

7.1289 

71824 

7.2361 

1 63401 

1 63707 

1 64012 

5.16720 

5 17687 
5.18652 

2.70 

7 2900 

164317 

5.19615 

2.71 

2.72 

2.73 

7 3441 

7 3984 

7 4529 

1 64621 
164924 

1 65227 

5.20577 

5.21536 

5.22494 

2 74 

2.75 

2.76 

7 5076 

7 5625 

7 6176 

1 65529 
165831 
1.66132 

5.23450 

5.24404 

5.25357 

2 77 

2.78 

2.79 

7 6729 

7 7284 

7 7841 

1 66433 1 
1.66733 1 
1 67033 

5 26308 
5.27257 
5.28205 

2.80 

7 8400 

1 67332 

5.29150 

2 81 

2 82 
2.83 

7 8961 

7 9524 

8 0089 

1 67631 

1 67929 
1.68226 

5 30094 
5.31037 
5.31977 

2 84 
2.85 

2 86 

8 0656 
81225 

8 1796 

1 68523 

1 68819 

1 69115 

5 32917 
5.33854 
5.34790 

287 

2 88 
2.89 

8 2369 
8 2944 
8 3521 

1 69411 
169706 
170000 

5 .35724 
5.36656 

5 37587 

2.90 

8 4100 

1 70294 

5.38516 

2 91 
292 

2 93 

8 4681 

8 5264 
8 5849 

1 70587 

1 70880 
1.71172 

5.39444 

5.40370 

5.41295 

294 

2 95 
2^6 

8 6436 
8 7025 
8 7616 

1.71464 
1 71756 
172047 

5.42218 

6.43139 

5.44059 

297 

2 98 
2.99 

8 8209 
8 8804 
8.9401 

1.72337 
1 72627 
1.72916 

5.44977 

5.45894 

5.46809 

3 00 

9 0000 

1 73205 

5.47723 

N 

N* 

Vif 

v'ioir 


393 




Table G. Sqtjabes and Sqitabb Boots — {C(mtiivued) 






3sr 

N* 

vs 

VlON 

3.00 

9 0000* 

1.73205 

SA7723 

3.01 

9 0601 

1.73494 

5 48635 

3.02 

9 1204 

1 73781 

5 49545 

3.03 

9.1809 

1.74069 

5 50454 

3.(F4 

9.2416 

1.74356 

5 51362 

3.05 

9 3025 

1 74642 

5 52268 

3.06 

9 3636 

1.74929 

5 53173 

3,07 

9 4249 

1 75214 

5 54076 

3 08 

9 4864 

1 75499 

5 54977 

3 09 

9 5481 

1 75784 

5 55878 

3.10 

9 6100 

1 76068 

6 56776 

3 U 

9 6721 

1 76352 

5 57674 

3 12 

9 7344 

1 76635 

5 58570 

3.13 

9.7969 

1.76918 

5.59464 

3 14 

9 8596 

1.77200 

5 60357 

3 15 

9 9225 

1 77482 

5 61249 

3 16 

9.9856 

1.77764 

5 62139 

317 

10 0489 

1 78045 

5 63028 

3 18 

10 1124 

1 78326 

5 63915 

3 19 

10.1761 

1 78606 

5 64801 

3.20 

10 2400 

1 78885 

5 65685 

3 21 

10 3041 

1 79165 

5 66569 

3 22 

10 3684 

179444 

5 67450 

3 23 

10.4329 

1.79722 

5.68331 

3 24 

10 4976 

1 80000 

5 69210 

3 25 

10 5625 

1 80278 

5 70088 

3 26 

10.6276 

1 80555 

5 70964 

3 27 

10 6929 

1 80831 

5 71839 

3 28 

10 7584 

1 81108 

5 72713 

3 29 

10 8241 

1 81384 

5 73585 

3.30 

10.8900 

1 81659 

5 74456 

3 31 

10 9561 

1 81934 

5 75326 

3 32 

11 0224 

1 82209 

5 76194 

3 33 

11.0889 

1.82483 

5 77062 

334 

11.1556 

1 82757 

5 77927 

3 35 

11 2225 

1 83030 

5 78792 

3 36 

11 2896 

1 83303 

5 79655 

3 37 

11.3569 

1 83576 

5 80517 

3 38 

11 4244 

1 83848 

5 81378 

3 39 

11 4921 

1 84120 

5 82237 

3 40 

11 5600 

1 84391 

5 83095 

3 41 

11 6281 

1 84662 

5 83952 

3 42 

11 6964 

1 84932 

5 84808 

3 43 

11.7649 

1 85203 

5.85662 

3 44 

11.8336 

1 85472 

5 86515 

3 45 

119025 

1 85742 

5 87367 

3 46 

11.9716 

1 86011 

5 88218 

3 47 

12 0409 

1 86279 

5 89067 

3 48 

12.1104 

1 86548 

5.89915 

3 49 

12.1801 

1.86815 

5 90762 

8.50 

12.2500 

1 87083 

5 91608 

N 

W* 

Vn 



N 


Vn 

Viow 

3.50 

12.2500 

1.87083 

5 91608 

3 51 

3 52 

3 53 

12.3201 

12.3904 

12.4609 

1 87350 
1.87617 
1.87883 

5 92453 

5 93296 
5.94138 

3 54 

3 55 

3 56 

12.5316 
12 6025 
12.6736 

1.88149 

1.88414 

1.88680 

5 94979 

5 95819 
5.96657 

3 57 

3 58 

3 59 

12 7449 
12 8164 
12 8881 

1.88944 

189209 

1.89473 

5 97495 

5 98331 

5 99166 

3.60 

12 9600 

1.89737 

6 00000 

3 61 

3 62 
3.63 

13 0321 
13 1044 
13.1769 

1.90000 

1 90263 
1.90526 

6 00833 

6 01664 

6 02496 

3 64 

3 65 
3.66 

13 2496 
13 3225 
13 3956 

1.90788 

1.91050 

1.91311 

6 03324 

6 04152 

6 04979 

3 67 

3 68 

3 69 

13 4689 
13 5424 
13 6161 

1 91572 
1.91833 
1.92094 

6 05805 

6 06630 

6 07454 

3.70 

13 6900 

1.92354 

6.08276 

3 71 

3.72 

3.73 

13 7641 
13 8384 
13 9129 

1.92614 

1.92873 

1.93132 

6 09098 
6.09918 
6.10757 

3.74 

3.75 

3.76 

13 9876 

14 0625 
14 1376 

1.93391 

1.93649 

1.93907 

6.11555 

6.12372 

6.13188 

3.77 

3.78 

3.79 

14 2129 1 
14 2884 
14 3641 

1.94165 

1 94422 
1.94679 

6.14003 

6.14817 

6.15650 

3.80 

14 4400 

1.94936 

6.16441 

3 81 

3 82 

3 83 

14 5161 
14 5924 
14 6689 

1 95192 
1.95448 
1.95704 

6.17252 

6 18061 
6.18870 

3 84 

3 85 
3.86 

14 7456 
14 8225 
14 8996 

1.95959 

1.96214 

1.96469 

6.19677 

6.20484 

6.21289 

3.87 

3.88 

3 89 

14 9769 

15 0544 
15.1321 

1.96723 

1 96977 
1.97231 

6.22093 

6.22896 

6.23699 

3.90 

15 2100 

1.97484 

624500 

3 91 

3.92 

3.93 

15 2881 
15 3664 
15 4449 

1 97737 
1.97990 
1.98242 

6.25300 

6 26099 
6.26897 

3.94 

3 95 
3^6 

15 5236 
15 6025 
15 6816 

1.98494 

1.98746 

1.98997 

6.27694 

6.28490 

6 29285 

3 97 

3.98 

3.99 

15 7609 
15 8404 
15 9201 

1.99249 

1.99499 

1.99750 

6.30079 

6 30872 

6 31664 

4.00 

16 0000 

2.00000 

6.32456 

N 

IT* 

vs 

viSS 


394 




Table G Squares and Square 'RooTB—iConiinued) 


N 

N* 

Vn 

VlON 


TT 


vs 

VlOK 

4.00 

16 0000 

2 00000 

6 32456 


4.50 

20 2500 

2 12152 

6.70820 

4 01 

16 0801 

2 00250 

6 33246 


4 51 

20 3401 

2 12368 

6 71565 

4 02 

16 1604 

2 00499 

6 34055 


4.52 

20 4304 

2 12603 

6 72309 

4 03 

16.2409 

2.00749 

6 34823 


4.53 

20.5209 

2 12838 

6.73053 

4 04 

16 3216 

2 00098 

6 35610 


4 54 

20 6116 

2 13073 

6 73?95 

4 05 

16 4026 

3 01246 

6 36396 


4.55 

20 7025 

2 13307 

6.74537 

4 06 

16.4836 

2 01494 

6 37181 


4.56 

20 7936 

2 15542 

6.75278 

4 07 

16 5649 

2 01742 

6 37966 


4.57 

20 8849 

2 13776 

6 76018 

4 08 

16 6464 

2 01990 

6 38749 


4 58 

20 9764 

2 14009 

6 76757 

4 09 

16 7281 

2 02237 

6 39531 


4.59 

21 0681 

2 14243 

6.77495 

4.10 

16 8100 

2 02485 

6 40312 


4.60 

21 1600 

2 14476 

6.78233 

4.11 

16 8921 

2 02731 

6 41093 


4 61 

21 2521 

2 14709 

6.78970 

4.12 

16 9744 

2 02978 

6 41872 


4.62 

21 3444 

2 14942 

6 79706 

4.13 

17 0569 

2 03224 

6 42651 


463 

21 4369 

2 15174 

6.80441 

4.14 

17 1396 

203470 

6 43428 


4.64 

21 5296 

2 15407 

6 81175 

4.15 

17 2225 

2.03715 

6 44205 


4 65 

21 6225 

2 15639 

6 81909 

4.16 

17.3056 

2.03961 

6 44981 


4.66 

21 7156 

2 15870 

6.82642 

4.17 

17 3889 

2.04206 

6 45755 


4.67 

21 8089 

2 16102 

6 83374 

4 18 

17.4724 

2 04450 

6 46529 


4.68 

21 9024 

2 16333 

6.84105 

4.19 

17 5561 

2 04695 

6 47302 


4.69 

21 9961 

2 16564 

6 84856 

4.20 

17 6400 

2 04939 

6 48074 


4.70 

22 0900 

2 16795 

6 85565 

4 21 

17 7241 

2 05183 

6 48845 


4.71 

22 1841 

2 17025 

6 86294 

4.22 

17 8084 

2 05426 

6 49615 


4.72 

22 2784 

2 17256 

6 87023 

4 23 

17.8929 

2 05670 

650384 


4.73 

22 3729 

2 17486 

6 87750 

4.24 

17.9776 

2.05913 

6 51153 


4.74 

22 4676 

2 17715 

6 88477 

4.25 

18 0625 

2 06155 

6 61920 


4.75 

22 5625 

2 17945 

6 89202 

4.26 

18.1476 

2.06398 

6 52687 


4.76 

22 6576 

2.18174 

6 89928 

4 27 

18 2329 

2 06640 

6 53452 


4-77 

22 7529 

2 18403 

6 90652 

4 28 

18.3184 

2 06882 

6 54217 


4.78 

22 8484 

2 18652 

6 91375 

429 

18 4041 

2 07123 

6 54981 


4.79 

22 9441 1 

2 18861 

6 92098 

4.30 

18.4900 

2 07364 

655744 


4.80 

23 0400 

2 19089 

6.92820 

4 31 

18.5761 

2 07605 

6 56506 


4 81 

23 1361 

2 19317 

6 93542 

4 32 

18 6624 

2 07846 

6 57267 


4.82 

23 2324 

2 19545 

6.94262 

4 33 

18 7489 

2.08087 

658027 


4.85 

23 5289 

2 19773 

6 94982 

4 34 

18 8356 

2 08327 i 

658787 


4.84 

23 4256 

2 20000 

6.95701 

4 35 

18 9225 

2 08567 

6 59545 


4.85 

23 5225 

2 20227 

6 96419 

4.36 

19.0096 

2 08806 

660303 


4 86 

23 6196 

2 20454 

6.97137 

437 

19.0969 

2 09045 

661060 


4.87 

23 7169 

2 20681 

6.97854 

4 38 

19 1844 

2 09284 

661816 


4 88 

23 8144 

2 20907 

6 98570 

4 39 

19.2721 

2 09623 

6 62571 


4 89 

23 9121 

2 21133 

6 99285 

4 40 

19 3600 

2 09762 

663325 


4.90 

24 0100 

2 21559 

7.00000 

4 41 

19 4481 

2 10000 

6 64078 


4 91 

24 1081 

2 21585 

7 00714 

4 42 

19.5364 

2 10238 

6 64831 


4.92 

24 2064 

2 21811 

7.01427 

4 43 

19.6249 

2.10476 

6 65582 


4.93 

24 3049 

2 22036 

7.02140 

444 

19.7136 

2.10713 

6.66333 


4.94 

24 4036 

2 22261 

7.02851 

4 45 

19.8025 

2 10950 

6.67083 


4.95 

24 5025 

2 22486 

7.03562 

4 46 

19.8916 

2.11187 

6.67832 


4.96 

24 6016 

2 22711 

7.04273 

4 47 

19.9809 

2.11424 

668581 


4.97 

24 7009 

2 22935 

7.04982 

4 48 

20.0704 

2.11660 

6 69328 


4.98 

24 8004 

2 23159 

7.05691 

4.49 

20.1601 

2.11896 

6.70075 


4 99 

24 9001 

2 23383 

7.06399 

4.50 

20.2500 

2.12132 

6.70820 


5.00 

250000 

2.23607 

7.07107 

N 

TP 

vs 

VioS 


N 

N* 

vs 

Vira 


395 




Table G Squabbs and Sqttabb Roots — {CoTdznued) 


N 


VS 

VWN 

5.00 

25.0000 

2 23607 

7 07107 

5 01 

25.1001 

2.23830 

7 07814 

5 02 

25 2004 

2 24054 

7 08520 

5 03 

25.3009 

2 24277 

7 09225 

5^4 

25 4016 

2 24499 

7 09930 

5 05 

25.5025 

2 24722 

7 10634 

5 06 

25.6036 

2 24944 

7.11337 

5 07 

25 7049 

2 25167 

7 12039 

5 08 

25 8064 

2 25389 

7.12741 

5 09 

25.9081 

2 25610 

7 13442 

5 10 

26 0100 

2 25832 

7 14143 

511 

26 1121 

2 26053 

7 14843 

5 12 

26 2144 

2 26274 

7.15542 

515 

26 3169 

2 26495 

7.16240 

5 14 

26 4196 

2 26716 

7 16938 

5 15 

26 5225 

2 26936 

7 17635 

5 16 

26 6256 

2 27156 

7.18331 

5 17 

26.7289 

2 27376 

7 19027 

5 18 

26 8324 

2 27596 

7 19722 

5 19 

26 9361 

2 27816 

7 20417 

5 20 

27 0400 

2 28035 

7 21110 

5 21 

27.1441 

2 28254 

7 21803 

5 22 

27 2484 

2 28473 

7.22496 

5 23 

27.3529 

2 28692 

7.23187 

524 

27 4576 

2 28910 

7 23878 

5 25 

27.5625 

2 29129 

7 24569 

5 26 

27.6676 

2 29347 

7 25259 

5 27 

27.7729 

2 29565 

7,25948 

5 28 

27 8784 

2 29783 

7 26636 

5 29 

27 9841 

2 30000 

7 27324 

5 30 

28.0900 

2 30217 

7 28011 

5 31 

28 1961 

2 30434 

7 28697 

5 32 

28 3024 

2 30651 

7 29383 

5 33 

28.4089 

2 30868 

7.30068 

5 34 

28 5156 

2 31084 j 

7 30753 

5 35 

28.6225 

2 31301 

7 31437 

5 36 

28.7296 

2 31517 j 

7 32120 

5 37 

28.8369 

2 31733 

7 32803 

5 38 

28 9444 

2 31948 

7 33485 

5 39 

29 0521 

2 32164 

7 34166 

5 40 

29.1600 

2 32379 

7.34847 

5 41 

29 2681 

2 32594 

7 35527 

5 42 

29 3764 

2 32809 

7 36206 

5 43 

29.4849 

2 33024 

7 36885 

5 44 

29 5936 

2 33238 

7 37564 

5 45 

29 7025 

2 33452 

7 38241 

5.46 

29.8116 

2 33666 

7 38918 

5.47 

29.9209 

2 33880 

7.39594 

5 48 

30 0304 

2 34094 

7 40270 

5 49 

30.1401 

2.34307 

7.40945 

5.50 

30.2500 

2 34521 

7.41620 

N 

TSP 

vs 

VioN 


IT 


vs 

ViMI 

5.50 

30.2500 

2.34521 

7 41620 

5 51 

5 52 

5 53 

30 3601 
30 4704 
30.5809 

2.54734 

2.34947 

2 35160 

7 42294 

7 42967 

7 43640 

5 54 

5 55 

5 56 

30 6916 
30.8035 
30 9136 

2 35372 

2 35584 
2.35797 

7 44312 

7 44983 
7.45654 

5.57 

5 58 

5 59 

31.0249 
31.1364 
31 2481 

2 36008 

2 36220 

2 36432 

7 46324 

7 46994 

7 47663 

5.60 

31 3600 

2 36643 

7 48331 

5 61 

5 62 

5 63 

31.4721 

31.5844 

31.6969 

2 36854 

2 37065 

2 37276 

7.48999 

7 49667 
7.50333 

5 64 

5 65 

5 66 

31.8096 
31.9225 
32 0356 

2 37487 

2 37697 

2 37908 

7.50999 

7 51665 

7 52330 

5 67 

5 68 

5 69 

32.1489 
32 2624 
32 3761 

2 38118 

2 38328 

2 38537 

7 52994 

7 53658 

7 54321 

5.70 

32.4900 

2 38747 

7 54983 

5 71 

5 72 

5 73 

32 6041 
32.7184 
32 8329 

2 38956 

2 39165 

2 39374 

7 55645 

7 56307 

7 56968 

5 74 

5 75 
5.76 

32 9476 

33 0625 
33.1776 

2 39583 

2 39792 

2 40000 

7.57628 

7 58288 

7 58947 

5 77 

5 78 
5.79 

33.2929 
33 4084 
33.5241 

2 40208 

2 40416 

2 40624 

7 59605 

7 60263 

7 60920 

5.80 

33 6400 

2 40832 

7 61577 

5 81 

5 82 
5.83 

33.7561 
33.8724 i 
33.9889 

2.41039 
2 41247 
2 41454 

7 62234 
7.62889 

7 63544 

5.84 

5 85 
5.86 

34.1056 

34.2225 

34.3396 

2 41661 

2 41868 
2.42074 

7 64199 

7 64853 

7 65506 

5 87 

5 88 

5 89 

34 4569 
34 5744 
34.6921 

2 42281 

2 42487 
2 42693 

7 66159 

7 66812 

7 67463 

5.90 

34.8100 

2 42899 

7.68115 

5 91 

5.92 

5.93 

34 9281 

35 0464 
35.1649 

2.43105 
2 43311 

2 43516 

7 68765 
7.69415 

7 70065 

5 94 

5 95 
5.96 

35 2836 
35 4025 
35.5216 

2.43721 
2 43926 
2.44131 

7.70714 

7 71362 
7.72010 

5.97 

5 98 

5 99 

35 6409 
35 7604 
35 8801 

2 44336 
2 44540 
2.44745 

7.72658 

7.73305 

7.73951 

6.00 

36.0000 

2.44949 

7.74597 

N 

IT* 

VS 

VioS 


396 



Table G Squabes and Squabe Bx>OTb— {Continued) 


w 

N* 

vs 

VlOK 


N 

IP 1 

vs 1 

ViOT 1 

6.00 

36 0000 

2 44949 

7 74597 


6.50 

IHI 



6 01 

36 1201 

2 45153 

7 75242 


6 51 

42 3801 

2 55147 

8 06846 

6 02 

36 2404 

2 45357 

7 75887 


6 52 

42 5104 

2 55343 

8 07465 

6 03 

36 3609 

2 45561 

7.76S31 


6.33 

42.6409 

2 55539 

8 08084 

6 04 

36 4816 

2 45764 

7 77174 


654 

42.7716 

2 55734 

8 08703 

6.05 

36 6025 

2 45967 

7 77817 


6 55 

42 9025 

2 55930 

8 09321 

6 06 

36 7236 

2 46171 

7.78460 


656 

43 0336 

2.56125 

8 09938 

6 07 

36 8449 

2 46374 

7.79102 


6 57 

43 1649 

2 56320 

8 10555 

6.08 

36 9664 

2 46577 

7.79744 


6 58 

43 2964 

2 56515 

8 11172 

6 09 

57 0881 

2 46779 

7 80385 


6 59 

43 4281 

2 56710 

8 11788 

6.10 

57.2100 

2 46982 

7 81025 


6.60 

43 5600 

2 56905 

8 12404 

611 

37 3321 

2 47184 

7 81665 


6 61 

43 6921 

2 S7099 

8 13019 

612 

37.4544 

2 47386 

7 82304 


6 62 

43 8244 

2 57294 

8 13634 

615 

37.5769 

2 47588 

7.82943 


663 

43.9569 

2 57488 

8 14248 

6.14 

37 6996 

2 47790 

7.83582 


664 

44 0896 

2 57682 

8 14862 

615 

37 8225 

2 47992 

7 84219 


6 65 

44 2225 

2 57876 

8 15475 

6.16 

37S466 

2 48193 

7.84857 


6 66 

44 3556 

2 58070 

8 16088 

617 

38 0689 

2 48395 

7.85493 


6 67 

44 4889 

2 58263 

8.16701 

6 18 

38 1924 

2 48596 

7.86130 


6 68 

44 6224 

2 58457 

8 17513 

619 

38.3161 

2 48797 

7 86766 


6 69 

44 7561 

2 58650 

8 17924 

6.20 

38 4400 

2.48998 

7.87401 


6.70 

44 8900 

2 58844 

8 18535 

6 21 

38.5641 

2 49199 

7.88036 


6 71 

45 0241 

2 59037 

8 19146 

6.22 

38 6884 

2.49399 

7.88670 


6 72 

45 1584 

2 59230 

8 19756 

6.23 

38 8129 

2.49600 

7 89303 


6 75 

45 2929 

2 59422 

8.20366 

6.24 

38.9376 

2 49800 

7.89937 


6 74 

45 4276 

2 59615 

8 20975 

6 25 

39 0625 

2 50000 

7.90569 


6 75 

45 5625 

2 59808 

8 21584 

6 26 

39.1876 

2.50200 

7.91202 


6.76 

45 6976 

2.60000 

8 22192 

6 27 

39.3129 

2 50400 

7.91833 


6.77 

45.8329 

2 60192 

8 22800 

6 28 

39.4384 

2 50599 

7.92465 


6 78 

45 9684 

2 60384 

8 23408 

6 29 

39.5641 

2.50799 

7.93095 


6.79 

461041 

2 60576 

8 24015 

6.30 

39.6900 

2.50998 

7.93725 


6.80 

46 2400 

2.60768 

8 24621 

6.31 

39.8161 

2-51197 

7.94355 


6 81 

46 3761 

2 60960 

8 25227 

6.32 

39 9424 

2 51396 

7.94984 


6.82 

46 5124 

2.61151 

8 25833 

6.33 

40.0689 

2 51595 

7.95613 


6.83 

466489 

2 61343 

8 26438 

6.34 

40.1956 

2 51794 

7.96241 


6.84 

46 7856 

2 61534 

8 27043 

6.35 

40.3225 

2,51992 

7.96869 


6 85 

46 9225 

2 61725 

8 27647 

6 36 

40.4496 

2.52190 

7.97496 


6.86 

47 0596 

2.61916 

8 28251 

6 37 

40.5769 

2.52389 

7.98123 


6 87 

47 1969 

2 62107 

8 28855 

6 38 

40 7044 

2.52587 

7.98749 


6 88 

47 3344 

2 62298 

8 29458 

6.39 

40.8321 

2 52784 

7.99375 


6.89 

47 4721 

2 62488 

8 50060 

6.40 

40.9600 

2.52982 

8.00000 


6.90 

47 6100 

2.62679 

8 30662 

6 41 

41.0881 

2.53180 

8 00625 


6,91 

47 7481 

2 62869 

8 31264 

6.42 

41.2164 

2.53377 

8 01249 


6 92 

47 8864 

2.63059 

8 31865 

6.43 

41.3449 

2.53574 

8.01873 


6.93 

48 0249 

2.63249 

8.32466 

6.44 

41.4736 

2 63772 

8 02496 


694 

48 1636 

2.63439 

8 33067 

6.45 

41 6025 

2 53969 

8 03119 


6 95 

48 3025 

2.63629 

8 33667 

6.46 

41.7316 

2.54165 

8.03741 


636 

48.4416 

2.63818 

8.34266 

6.47 

41.8609 

2 54362 

8.04363 


6.97 

48.5809 

2.64008 

8 34865 

6.48 

41.9904 

2.54558 

8 04984 


698 

48 7204 

2.64197 

8 35464 

6.49 

42.1201 

2.54755 

8.05605 


6.99 

48.8601 

2.64386 

8 36062 

6.50 

42.2500 

2.54951 

8 06226 


7.00 

49 0000 

2.64575 

8 36660 

K 

JSP 

vs 

VlON 



TSfi 


1 VioH 


397 















Table G Squares \nd Square Roots — (Continited) 


N 

R® 

VF 

VlON 

7.00 

49 0000 

2 64575 

8 36660 

7.01 

7 02 

7 03 

49 1401 
49 2804 
49 4209 

2 64764 

2 64953 

2 65141 

8 37257 

8 37854 

8 38451 

7^4 

7 05 

7 06 

49 5616 
49 7025 
49 8436 

2 65330 

2 65518 

2 65707 

8 39047 

8 59643 

8 40238 

7 07 

7 08 

7 09 

49 9849 

50 1264 
50 2681 

2 65895 

2 66083 

2 66271 

8 40833 

8 41427 

8 42021 

7.10 

50 4100 

2 66458 

8 42615 

711 

712 
7.13 

50 5521 
50 6944 
50 8369 

2 66646 

2 66833 

2 67021 

8 43208 

8 43801 
8.44393 

714 

715 

716 

50 9796 

51 1225 
51 2656 

2 67208 

2 67395 

2 67582 

8 44985 

8 45577 

8 46168 

717 

718 

7 19 

51.4089 
51 5524 
51 6961 

2 67769 

2 67955 

2 68142 

8 46759 

8 47349 

8 47939 

7.20 

51 8400 

2 68328 

8 48528 

7 21 

7 22 
7.23 

51 9841 
52.1284 

52 2729 

2 68514 

2 68701 

2 68887 

8 49117 

8 49706 

8 50294 

7.24 

7.25 

7.26 

52 4176 
52 5625 
52 7076 

269072 

2 69258 
269444 

8 50882 

8 51469 

8 52056 

7 27 

7 28 
7.29 

52 8529 

52 9984 

53 1441 

269629 

269815 

2 70000 

8 52643 

8 53229 

8 53815 

7 30 

53 2900 

2 70185 

8 54400 

7 31 
7.32 

7 33 

53 4361 
53.5824 
53.7289 

2 70370 

2 70555 

2 70740 

8 54985 

8 65570 

8 56154 

7 34 
7.35 

7 36 

i 53.8756 
54 0225 
54.1696 

2 70924 

2 71109 

2 71293 

8 56738 

8 57321 

8 57904 

7 37 

7 38 

7 39 

54 3169 
54 4644 
54 6121 

2 71477 

2 71662 
2.71846 

8 58487 

8 59069 

8 59651 

7.40 

54 7600 

2 72029 

8 60233 

7 41 

7.42 

7.43 

54.9081 

55.0564 

55.2049 

2 72213 
2.72397 
2.72580 

8 60814 

8 61394 
8.61974 

744 

7.45 

7.46 

55.3536 

55.5025 

55.6516 

2 72764 

2 72947 

2 73130 

8 62554 

8 63134 
8.63713 

7 47 

7.48 

7.49 

55.8009 
55.9504 
56 1001 

2 73313 
2 73496 

2 73679 

8 64292 

8 64870 

8 65448 

7.50 

56 2500 

2 73861 

8 66025 

N 

11® 

Vn 

vioir 


N 

N® 

VK 

VlON 

7.50 

56 2500 

2.73861 

8 66025 

7 51 

56 4001 

2 74044 

8 66603 

7 52 

56 5504 

2 74226 

8 67179 

7.53 

56 7009 

2.74408 

8 67756 

7.54 

56 8516 

2 74591 

8 68332 

7 55 

57 0025 

2 74773 

8 68907 

7.56 

67 1536 

2 74955 

8 69483 

7 57 

57 3049 

2 75136 

8 70057 

7 58 

57 4564 

2 75318 

8 70632 

7.59 

57 6081 

2 75500 

8 71206 

7.60 

57 7600 

2 75681 

8 71780 

7 61 

57 9121 

2 75862 

8 72353 

7 62 

58 0644 

2 76045 

8 72926 

7.63 

58 2169 

2.76225 

8 75499 

7.64 

58 3696 

2 76405 

8 74071 

7 65 

58 5225 

2.76586 

8 74643 

7 66 

58 6756 

2.76767 

8 75214 

7.67 

58 8289 

2 76948 

8 75785 

7 68 

58 9824 

2 77128 

8 76356 

7.69 

59 1361 

2 77308 

8 76926 

7.70 

69.2900 

2.77489 

8 77496 

7 71 

59 4441 

2 77669 

8 78066 

7 72 

59 5984 

2 77849 

8 78635 

7.73 

59 7529 

2.78029 

8 79204 

7 74 

69.9076 

2 78209 

8 79773 

7.75 

60 0625 

2.78588 

8 80341 

7.76 

60 2176 

2.78568 

8 80909 

7 77 

60 3729 

2 78747 

8 81476 

7.78 

60 5284 

2.78927 

8 82043 

7.79 

60.6841 

2 79106 

8 82610 

7.80 

60.8400 

2 79285 

8 83176 

7.81 

60 9961 

2 79464 

8 83742 

7.82 

61.1524 

2.79643 

8 84308 

7 83 

61 3089 

2.79821 

8 84873 

7 84 

61.4656 

2 80000 

8 85438 

7.86 

61.6225 

2 80179 

8 86002 

7.86 

61.7796 

2.80357 

8 86566 

7 87 

61 9369 

2.80535 

8 87130 

7 88 

62 0944 

2 80713 

8 87694 

7 89 

62 2521 

2.80891 

8 88257 

7.90 

62.4100 

2.81069 

8 88819 

7 91 

62 5681 

2.81247 

8.89382 

7 92 

62 7264 

2 81425 

8 89944 

7.93 

62.8849 

2.81603 

8.90505 

7 94 

63 0436 

2 81780 

8.91067 

7 95 

63.2025 

2 81957 

8.91628 

7.96 

63.3616 

2.82135 

8.92188 

7 97 

63.5209 

2.82312 

8 92749 

7.98 

63.6804 

2 82489 

8.93308 

7.99 

63.8401 

2 82666 

8 93868 

8.00 

64 0000 

2 82843 

8.94427 

Iff 

Iff* 

■v/F 

Vioir 


398 




Table G. Squabes and Square Roots — {Continued) 


w 

N* 

VN 

VlON 

8 00 

64 0000 

2 82843 

8.94427 

8 01 

64 1601 

2 83019 

8 94986 

8 02 

64 3204 

2 83196 

8 95545 

8 05 

64 4809 

2 83373 

8 96103 

8 04 

64 6416 

2 83549 

8.96660 

8 05 

64 8025 

2 83725 

8 97218 

806 

64 9656 

2 85901 

8.97775 

8 07 

65.1249 

2 84077 

8 98532 

8 08 

65 2864 

2 84253 

8 98888 

8 09 

65 4481 

2 84429 

8 99444 

8.10 

65 6100 

2 84605 

9 00000 

8 11 

65 7721 

2 84781 

9 00555 

8 12 

65 9344 

2 84956 

9 01110 

8 13 

66.0969 

2 85132 

9 01665 

8.14 

66 2596 

2 85307 

9 02219 

815 

66 4225 

2 85482 

9 02774 

8.16 

66.5856 

2.85657 

9.03327 

8.17 

66.7489 

2 85832 

9 03881 

8 18 

66 9124 

2 86007 

9 04434 

8 19 

67.0761 

2 86182 

9.04986 

8.20 

67.2400 

2 86356 

9 05539 

8 21 

67.4041 

2 86531 

9 06091 

8.22 

67 5684 

2 86705 

9 06642 

8 23 

67.7329 

2 86880 

9 07193 

8.24 

67.8976 

2 87054 

9.07744 

8.25 

68.0625 

2 87228 

9 08295 

8.26 

68.2276 

2.87402 

9 08845 

8.27 

68 3929 

2,87576 

9.09395 

8 28 

68 5584 

2 87750 

9 09945 

8.29 

68 7241 

2.87924 i 

9.10494 

8.30 

68.8900 

2 88097 

9 11043 

8.31 

69 0561 

2 88271 

9.11592 

8.32 

69 2224 

2 88444 

9 12140 

833 

69.3889 

2 88617 

9.12688 

8.34 

69.5556 

2 88791 

9.13236 

8.35 

69.7225 

2.88964 

9 13783 

8.36 

69.8896 

2.89137 

9.14330 

8 37 

70 0569 

2 89310 

9 14877 

8 38 

70.2244 

2.89482 

9.15423 

8.39 

70.3921 

2 89655 

9.15969 

8.40 

70.5600 

2 89828 

9.16516 

8.41 

70,7281 

2.90000 

9,17061 

8.42 

70 8964 

2 90172 

917606 

8.43 

71.0649 

2.90346 

9.18150 

8,44 

7122336 

2.90517 

9.18695 

8.45 

71.4025 

2.90689 

9.19239 

8.46 

71.5716 

2.90861 

9-19783 

8.47 

71.7409 

2.91033 

9.20326 

8.48 

71.9104 

2.91204 

9.20869 

8.49 

72.0801 

2.91376 

9.21412 

8.50 

72.2500 

2.91548 

9 21954 

isr 

IP 

vs 

VlON 


N 

N2 

vs 

viojr 

8 50 

72 2500 

2 91548 

9.21954 

8 51 

72 4201 

2 91719 

9.22497 

8 52 

72 5904 

2 91890 

9 23038 

8 53 

72 7609 

2 92062 

9.23580 

854 

72 9316 

2 92233 

9.24121 

8 55 

73 1025 

2 92404 

9.24662 

856 

73 2736 

2 92575 

9.25203 

8 57 

73 4449 

2 92746 

9.25743 

8 58 

73 6164 

2 92916 

9.26283 

8 59 

73 7881 

2 93087 

9.26823 

8.60 

73 9600 

2 93258 

9.27562 

8 61 

74 1321 

2 93428 

9.27901 

8 62 

74 3044 

2 95598 

9.28440 

863 

74 4769 

293769 

9.28978 

864 

74 6496 

2 93939 

9.29516 

865 

74 8225 

2 94109 

9.30054 

8 66 

74 9956 

2.94279 

9.30591 

8 67 

75 1689 

2 94449 

9.31128 

8 68 

75 3424 

2 94618 

9.31665 

8 69 

755161 

2 94788 

9.32202 

8.70 

75 6900 

2 94958 

9.32738 

8 71 

75 8641 

2 95127 

9.33274 

8 72 

76 0584 

2 95296 

9 33809 

8 75 

76 2129 

2 95466 

9.34345 

8 74 

763876 

2 95635 

9.34880 

8 75 

76 5625 

2 95804 

9.35414 

8.76 

76.7376 

2 95973 

9.35949 

8.77 

76 9129 

2 96142 

9.36483 

8 78 

77 0884 

2 96311 

9.37017 

8.79 1 

77 2641 

2 96479 

9.37550 

8.80 

77 4400 

2 96648 

9.38083 

8 81 

77 6161 

2 96816 

9.38616 

8 82 

77 7924 

2 96985 

9.39149 

8 83 

77 9689 

2.97153 

9.39681 

8.84 

781456 

2 97521 

9.40213 

8.85 

78.3225 

2 97489 

9.40744 

8.86 

78.4996 

2 97658 

9.41276 

8.87 

786769 

2 97825 

9.41807 

8.88 

78 8544 

2.97993 

9.42338 

8.89 

790321 

2 98161 

9.42868 

8.90 

79 2100 

2 98329 

9.43398 

8.91 

79 3881 

2 98496 

9.43928 

8.92 

796664 

2 98664 

9.44458 

8.93 

79.7449 

2.98831 

9.44987 

894 

799236 

2 98998 

9.45516 

8.95 

80.1025 

2 99166 

9.46044 

8.96 

80 2816 

2.99333 

9.46573 

8.97 

80.4609 

299500 

1 9.47101 

8.98 

80.6404 

2 99666 

; 9.47629 

8.99 

80 8201 

2 99833 

; 9.48156 

9.00 

81.0000 

3 OOOOO 

^ 9.48683 

IT 

IP 

Vif 

VioN 


399 




Tcible G Squares and Sqtt^rb Rootb — {CoTdznue^ 


N 

1 N* 

VN 

VlON 

9 00 

81 0000 

3 00000 

9.48683 

9 01 

9 02 

9 03 

81.1801 

81.3604 

81.5409 

3 00167 
3 00333 
3 00500 

9.49210 

9 49737 
9.50263 

9i04 

9 05 

9 06 

81 7216 

81 9025 

82 0836 

3 00666 
3 00832 
3 00998 

9.50789 

9.51315 

9.51840 

9 07 

9 08 

9 09 

82 2649 
82 4464 
82 6281 

3 01164 

3 01330 

3 01496 

9 52365 

9 52890 

9 53415 

910 

82.8100 

3 01662 

9.53939 

911 

9 12 
913 

82 9921 

83 1744 
83.3569 

3 01828 

3 01993 

3 02159 

9 54463 

9 54987 
9.55510 

914 

915 
9.16 

83 5396 
83 7225 
83 9056 

3 02324 

3 02490 
3.02655 

9 56033 

9 56556 
9.57079 

917 

918 

919 

84 0889 
84 2724 
84 4561 

3 02820 

3 02985 

3 03150 

9.57601 

9.58123 

9.58645 

9.20 

84 6400 

3 03315 

9.59166 

9 21 

9 22 

9 23 

84 8241 

85 0084 
85.1929 

3 03480 

3 03645 

3 03809 

9 59687 

9 60208 
9.60729 

9 24 

9 25 

9 26 

85.3776 
85 5625 
85.7476 

3 03974 

3 04138 
3.04302 

9.61249 

9.61769 

9.62289 

9 27 

9 28 

9 29 

85.9329 
86 1184 
86 3041 

3 04467 

3 04631 

3 04795 

9.62808 

9.63328 

9.63846 

9.30 

86.4900 

3 04959 

9.64365 

9 31 

9 32 

9 33 

86 6761 
86.8624 
87.0489 

3 05123 

3 05287 

3 05450 

9.64883 

9.65401 

9 65919 

934 

9 35 

9 36 

87.2356 

87.4225 

87,6096 

3 05614 

3 05778 
3.05941 

9.66437 

9.66954 

9.67471 

9.37 

9 38 

9 39 

87 7969 

87 9844 

88 1721 

3 06105 

3 06268 

3 06431 

9 67988 
9.68504 
9.69020 

9.40 

88.3600 

3 06594 

9.69536 

9 41 

9 42 
943 

88 5481 
88 7364 
88 9249 

3 06757 

3 06920 

3 07083 

9 70052 

9 70567 
9.71082 

944 

9 45 
9.46 

89 1136 
89.3025 
89 4916 

3 07246 

3 07409 

3 07571 

9 71597 

9 72111 
9.72625 

9.47 

9.48 

9 49 

89 6809 

89 8704 

90 0601 

3 07734 

3 07896 

3 08058 

9.73139 

9 73653 
9.74166 

9 50 

90 2500 

3 08221 

9.74679 

H 1 

1 N» 

Vw 

Vioir 


N 

N2 

VK 

VlOK 1 

9 50 

90 2500 

3 08221 

9 74679 

9 51 

90.4401 

3 08383 

9 75192 

9 52 

90 6304 

3 08545 

9.75705 

9 53 

90 8209 

3 08707 

9 76217 

9 54 

91 0116 

3 08869 

9 76729 

9 55 

91 2025 

3 09031 

9 77241 

9 56 

91 3936 

3.09192 

9 77753 

9 57 

91 5849 

3 09364 

9 78264 

9 58 

91 7764 

3 09516 

9 78775 

9 59 

91 9681 

3 09677 

9.79285 

9.60 

92 1600 

3.09839 

9 79796 

9 61 

92 3521 

3 10000 

9 80306 

9 62 

92 5444 

3 10161 

9 80816 

9 63 

92 7369 

3.10322 

9 81326 

9 64 

92 9296 

3.10483 

9.81835 

9 65 

93.1225 

3 10644 

9 82344 

9 66 

93.3156 

3.10805 

9 82853 

9.67 

93.5089 

3 10966 

9 83362 

9 68 

93.7024 

3.11127 

9 83870 

9 69 

93 8961 

3.11288 

9 84378 

9.70 

94 0900 

3.11448 

9.84886 

9 71 

94 2841 

3 11609 

9 85393 

9 72 

94 4784 

3 11769 

9 85901 

9 73 

94.6729 

3.11929 

9 86408 

9 74 

94 8676 

3 12090 

9 86914 

9 75 

95 0625 

3 12250 

9 87421 

9 76 

95.2576 

3 12410 

9.87927 

9 77 

95 4529 

3.12570 

9 88433 

9 78 

95 6484 

1 3 12730 

9 88939 

9 79 

95 8441 

3 12890 

9 89444 

9.80 

96 0400 

3 15050 

9 89949 

9 81 

96 2361 

3 13209 

9 90454 

9 82 

96.4324 

3.13369 

9 90959 

9 83 

96 6289 

3.13528 

1 9 91464 

9 84 

96 8256 

3.13688 

9.91968 

9 85 

97 0225 

3 13847 

9.92472 

9 86 

97.2196 

3 14006 

9 92975 

9 87 

97 4169 

3.14166 

9 93479 

9 88 

1 97 6144 

3 14325 

9.93982 

9 89 

97 8121 

3 14484 

9 94485 

9.90 

98 0100 

314643 

9.94987 

9 91 

98 2081 

3 14802 

9 95490 

9 92 

98 4064 

3 14960 

9.95992 

9 93 

98 6049 

3 15119 

9.96494 

9 94 

98 8036 

3 15278 

9 96995 

9 95 

99 0025 

3.15436 

9 97497 

9 96 

99.2016 

3.15595 

9.97998 

9.97 

99 4009 

3.15753 

9.98499 

9 98 

99 6004 

3.15911 

9.98999 

9 99 

99 8001 

3.16070 

9 99500 

10.00 

100 000 

3.16228 

10.0000 

N 

IP 

-✓ir 

VlOH 


400 




Index 


Alienation, coethcient of, 135 
Analysis of vanance, 249-342 
applications for significance 
of correlation, linear, 264-268, 
272-275 

of correlation ratio, 262-264, 
272-275 
of differences* 

for correlated means, 288-2t)0, 
317, 325 

for independent means, 253- 
255, 25fV-258 

of intoi action, 301, 306, 308, 30i), 
324, 328-335 

of multiple correlation, 276-279 
of nonlinearity, 2t)8-275 
of ndiahility, !^.K)-294, 310 
assumptions* 

homogeneity of vananccs, 249, 
255, 311,335 

independent variance estimates, 
240,252 

normality, 249, 255, 305-306, 
328 

violation of, 255 
classi6cations: 
higher, 337-338 

one-way or simple, 249-250, 281 
two-way or double, 281-288 
three-way or triple, 3U-3^ 
computation: 

double classification, 294-2t)6, 
298-301 

groups of unequal size, 261-262 
simple (classification, 256-258 
single group, 107 
tnplc classification, 317-324 
covanance method, 313-356 
computation, 350-352 
and correlation, 345-347 
degrees of freedom, 346, 350 
multiple, 354-355 
regression adjustments, 347-350, 
353 


Analysis of variance, 
covanance method (Continued) 
situations for use, 343-344, 353- 
354 

sums of products, 345 
degrees of freedom, 251-252, 265- 
266, 269, 276-277, 286-287,316 
error term for F, 293, 303-310. 
327-335 

factonal design, 338 
interaction, 283, 297 
higher, 337-338 
illustrations of, 301-303 
simple, 297, 301 
tnple, 315 

Latin square design, 338-342 
models, 304-305, 327-328 
components of variance, 304r-305, 
308-309, 332 

fixed constants, 304r-308, 327- 
330,338 

mixed, 304-305, 309-310, 330- 
335, 338 

pooling, 336, 340, 341 
prehmmary tests, 335-337 
sigmficant F, meaning of, 253-255 
sum of squares, 25 
between-groups, 251 
breakdown of, 249-252 
remainder, 286 
within-groups, 251 
vanancse estimates, 94 
between-groups, 253 
expected value of, 305 
interaction, 297 
meaning of, 252-255 
remainder, 287 
as error, 292 
residual, 267, 287 
withm-cells, 297 
within-groups, 252-253 
Arbitrary origin, 16-17 
Area samphng, 362 
Arkin, H , 12 



402 


Index 


Array, 116, 122 | 

Attenuation, 159-160 

Attributes, 55 

Average, 1, 16 

Average deviation, 21 

Bartlett's test, 248 
Best-fit line, 126-130 
Beta (J3) coefficients, 172 
Binomial distribution, 43-46 
and chi square, 213^214 
and hypothesis testmg, 49-53 
kurtosis of, 45 
mean of, 45 

and normal curve, 46-49 
and probabihty, 42-45 
skewness of, 45 
standard deviation of, 45 
Bisenal correlation, 192-197 
assumptions, 194 
formulas, 193, 196 
mterpretation of, 195 
and point bisenal, 196-197 
samphng error of, 194, 197 
Blakeman cntenon, 268 
Bnnton, W C , 12 
Brown-Spearman formula, 156-157 

Central value (tendency), 14 
mean, 16-18 
median, 14-16 
mode, 14 

Changes, evaluation of. 
for categoncal data, 56-59 
by covanance method, 355-356 
for graduated series, 79-80, 111- 
112, 355-356 
Chesire, L , 199n 
Chi square (x^), 212-240 
additive property of, 226, 230 
applications as test* 

of agreement with a priori fre- 
quencies, 222-223 
of changes, 228-230 
of correlation, 225, 235-236 
of goodness of fit, 224, 236-238 
of group differences, 223-224, 
225-226, 233-235 
of independence, 223 
of several correlated proportions, 
232-233 

assumptions, 221-222 
and binomial, 213-214 


Chi square (x®) {Continued) 
combimng of, 226, 230 
correction to, for continuity, 230- 
231 

and critical ratio, 214, 218, 226- 
228, 229-230 

degrees of freedom,216-217, 238-239 
and discontmmty, 214, 221-222 
distribution of, 213-221 
curves, 219 
mathematical, 218 
levels of sigmficance, 218-221, 238 
and normal curve, 214, 221 
and null hypothesis, 220-221 
one- vs two-tailed tests, 231-232 
and proportions, 226-228 
table of, 386-387 

Classification, 5-6; see also under 
Analjrsis of variance 
Cochran, W. G , 113, 232n 
Coded scores, 18, 23 
Colton, K R , 12 
Combined groups, 
mean for, 19 

standard deviation for, 26 
Common elements and correlation, 
140-141 

Comparison of groups, 82-83; see also 
Sigmficance, of differences 
Confidence ooeflSicient, 98 
Confidence interval, 95-99, 108, 109, 
110-111 

Confidence level, 98 
Confidence limits, 95-99, 107-108, 
109, 110-111 
Confounded, 329 
Contingency coefl&cient, 203-206 
and chi square, 203, 224r-228, 235- 
236 

corrections to, 205-206 
sampling error of, 206 
upper hmits of, 205 
Cont^ency tabte, 204, 224, 235 
Continuity, correction for, 52, 54, 
58, 61, 230-231 
Continuous senes, 5 
Correction: 

for attenuation, 159-160 
to contingency coefficient, 205-206 
for contmmty, 48, 52, 54, 58, 61, 
230-231 

for grouping, 25 

for uncontrolled variable, 344 



Index 403 


Correlation and causation, 140, 166- 
167, 187 

Correlation between: 
categorized vanables, 197-206 
dichotomized and graduated vari- 
ables, 192-197 

dichotomized variables, 197-206 
graduated vanables, 118, 207 
indexes, 161-163 
means, 86, 88 
point vanables, 202-203 
standard deviations, 88 
Correlation: 

factors affecting, 144-168 
■•terrors of measurement, 159-160 
^.heterogeneity, 149-150 
»4;hird variable, 164-167 
indexes, 161-163 
part-whole, 164 
range of talent, 149-150 
sampling errors, 145-147,264-268 
selection, 144 
measures of: 

^iserial, 192-197 
contingency, 203-206 
correlation ratio (eta), 207-208, 
262-264, 272-275 
fourfold point, 202-203 
mtraclass, 280 

multiple, 169-190; see also Mul- 
tiple con elation 
^partial, 164r-167 
point bisenal, 196-197 
product moment, 115-143; see 
also Product moment correla- 
tion 

rank, 208-210 
tetrachoric, 197-202 
Correlation ratio (eta), 207-208 
computation, 272-275 
sampling significance of, 262-264, 
272-275 

Correlations, avera^ng of, 148-149 
Covariance, 344; see also under Anal- 
ysis of variance 
Cox, G M , 113 
Crespi, L., 235 
Cntical ratio (C/2), 54, 58 
and chi square, 214, 218, 226-228, 
229-230 
and F, 261 
and t, 109 
Cntical region, 67 


Cumulative frequency distribution, 10 
Curvihnearity, test of, 268-275 

Decile, 20 

Degrees of freedom- 
m analysis of vaiiance, 251-252, 
265-266, 269, 276-277, 286- 
289, 316 

for chi square, 216-217, 238-240 
for F, 245 
for i test: 

for means, 106-107, 111 
for r, 146, 167 

for vanance estimate, 106-107 
Demmg, W. E , 363n 
Descriptive statistics, 2, 13 
Deviation score, 21 
Differences, see Sigmficance, of dif- 
ferences 

Discontinuity, see Continuity 
Discrete series, 5 
Discriminant fimction, 210-211 
Distnbution' 
binomial, 43-46 
chi square, 217-220 
cumulative, 10 
expected, 40 
F, 245 
frequency, 6 
mathematical, 40 
normal, 32-37 
observed, 40 
population, 40 
samplmg, 53 
«, 105-106 
theoretical, 40 

Distribution-free methods, 357-360 
chi square as, 357 
for correlated sets, 358-359 
Mann-Whitney U test, 359-360 
**median” test, 358 
sign test, 357 

Doohttle method, 182-185 

Edwards, A. L , 337n 
Elderton’s table of chi square, 221 
Error: 

absolute, 151 
constant, 151 

in drawing conclusions, 64t-70 
of estimate, 131-136, 174-176 
of measurement, 151-154 
reduction, 88-90, 361-^66 



4A4 


Index 


Error {Cordinued) 
relative, 151 

sampling, Bee under Standard error 
standard, see Standard error 
type I and type II, 65-69 
viable, 150, 154r-155 
Estimate, error of, 131-136, 174-176 
Estimation: 
interval, 95-99 
pomt, 94 
Estimator 
consistency, 94 
efl&ciency, 94 
unbiased, 94 
Eta (17), 207-208 
computation, 272-275 
sampling sigmficance of, 262-264, 
272-275 

Experimental and contiol data, treat- 
ment of 

matched distributions, 364-365 
own control, 86, 89, 108-109, 288- 
290, 317, 325 

paired (or matched) cases, 59-60, 
86, 89-90, 108-109, 288-290, 
317, 325 

randomly diawn, 87, 109-111, 256- 
259 

sibs and httei mates, 86, 108-109, 
288-290, 317, 325 
Ezekiel, M., 188 

Fy or variance ratio, 245 
and critical ratio (CE), 261 
degrees of freedom, 245 
distribution, 245 
error term for, 303-310, 327-^35 
for group variances, 245-247 
of mdependent estimates, 249, 252- 
253 

and ty 260, 268, 289 
table of, 389-391 
Factonal design, 338 
Fiducial hmits, 98 
Fimte umverse, 99-100 
Fisher, R A , 64, 147, 244, 356, 385- 
391 

Fittmg of Ime, 126-130 
Form vs. form reliabihty, 157-158, 
290-294, 310 

Fourfold point correlation, 202-203 
Fourfold table, 56-57, 198 
and changes, 57, 228-230 


Fourfold table {Continued) 
chi square for, 206, 224 
and contingency, 203, 206 
exact probability foi, 241-242 
and point correlation, 202, 206 
and tetiachoiic r, 197-202 
Frequency, 6 
as area, 9-10 

companson, see Chi squaie 

cumulative, 10 

curve, 8 

distnbution, 6 

polygon, 8 

table, 6 

Goodness of fit, 224, 236-238 
Graduated series, 5 
Graphic presentation, 7-12 
histogram, 7 
line graph, 11-12 
ogive, 10 
polygon, 8 
Grouping, 5-6 
and coding, 18 
correction for, 25 
Guessed average, 18 

Heterogeneity and correlation, 149- 
150, 159, 164-167, 347 
Histogram, 7 

Homoscedasticity, 131, 248 
test of, 248 
Horst, P , 337n 
Hypotheses, 50, 61 
alternate, 61-^2 
null, 56, 61-62 

one- vs two-tailed, 62-64, 112, 
231-232, 246-247 
research, 61 
statistical, 61 
Hypothesis testmg, 49 
by binomial, 49-53 
by chi square, 220-221 
by F distnbution, 245-247, 265 
by normal distnbution, 52, 58, 78 
by t distnbution, 106-114 

Independence, test of, 223 
Indexes 

correlation of, 161-163 
mean of, 162 

standard deviation of, 162 
Inference, statistical, 2, 50, 94 



Index 


405 


Interaction, 283, 297, 301-303, 315, 
337-338 

and group profiles, 335 
Intervals 
grouping, 6-6 
limits of, 6-7 
midpoints of, 7 
size of, 0 

Intraclass correlation, 280 

Kellev, T L , 181, 205 
Kendall, M G , 210 
Kurtosis, 13, 27-30 

Latin square design, 338-342 
Level: 

of confidence, 98-99 
of significance, 51, 64-70, 99, 113- 
114 

Line graph, 1 1-12 
I-»inearity of regression, 126 
t<*st for, 268-275 

McCall, \V A., 39 
MeXemar, Q , 379n 
Mann-Whitney U test, 359-360 
Matched gioujxs by means of 
matched distributions, 364-365 
paired cast‘s, 59-60, 86, 89, 364, 365 
randomization, 364-365 
sibhngs and twins, 86, 364, 366 
Mean, 16 

for combined groups, 19 
computation, 17-18 
samphng error of, 76-78, 104 
Mean difference, significance of, 79- 
80, 83-85, 108-109 
Measurement error, 150-154, 290-294 
Median, 14-16 
‘‘Median” teat, 358 
Mentzer, E G , 304n 
Midpoint of interval, 7, 14 
Mo(le, 14 

Models m analysis of variance, 304- 
306, 327-328 
Moments, 27-28 
Mood, A M,359 
Moses, L , 360 
Moving averages, 8 
Mueller, C G , 357n 
Multiple correlation, 175-176 
m covanance, 354-355 
and determinants, 180-181 


Multiple coi relation {ConUnued) 
and diminishing returns, 188 
and discriminant function, 211 
l>oohttle method, 181-185 
eiror of estimate, 174r-176 
inteipietation of, 176 
limitations, 186-188 
notation, 189-190 
numencal solution, 181-185 
regiession equations, 172-174, 180 
lelative weights in, 176-177 
sampling enor of, 185-186, 276-279 
selection fallacy, 187 
and shrinkage, 186, 279 
and suppressant variable, 188-189 

Nonlinearity, test of, 268-275 
Nonparametiic methods, 357-360, 
see also under Distnbution-free 
Normal correlation, 141-142 
Normal distribution curve, 33 
area under, 34-36 
equations for, 33, 35 
and piobabihty, 46-49 
table of, 382-383 
unit form of, 35 
Null hypothesis, 56, 61-62 

Ogive, 10 

One- vs two-tailed tests, 62-64, 112 
binomial, 52, 60, 241 
chi square, ^1-232 
F ratio, 246-247 

Paired cases, 86, 89, 364, 365 
Parameter, 2 

Partial correlation, 165-167 
sampling error of, 167 
Part-whole correlation, 164 
Paterson, D G , 182n 
Pauli, A E,, 336 

Pearson, K , 32, 142n, 198n, 221ii 
Percentage, see Proportion 
Percentile, 20-21 
Peters, 0 C , 112 
Pomt bisenal correlation, 196-197 
Point senes, 46 
Power of a test, 68-69 
Prediction, error of, 131-136, 174-176 
Probabihty, 42 
addition theorem, 42 
approximations to, 46-49, 240-241 
as area, 48-49 
and binomial, 43-46 



406 


Index 


Probability (ConUmted) 
exact, 240-242 

and hypothesis testing, 49-^3 
as level of significance, 51 
multiplication theorem, 42 
oHype I error, 65, 67 
of t 3 ^e II error, 67-70 
Probable error, 102-103 
Product moment correlation, 118 
assumptions, 126, 131, 136-137, 
139-140, 141, 143 
computation, 119-121 
direction of, 134-135 
mteipretations, in terms of 
common elements, 140-141 
error of estimate, 131-136 
normal surface, 141-142 
rate of change, 130 
vanance explained, 137-140 
hmits for, 142, 160-161, 167 
and prediction, 126-127, 130 
and regression, 130 
samphng error of, 145-147, 264-268 
scatter diagram, 116-118, 122-126 
Profiles and interaction, 335 
Proportion, samphng error of, 53-54 
Proportions as means, 101-102 

Quartile, 20 

Quartile deviation, 19-20 
Quota samplmg, 363 

Random samphng, 55, 75, 361-362 
Randomization, 364-365 
Range, 6, 19 

Rank correlation, 208-210 
Kendall’s tau, 210 
Spearman’s rho, 208 
sigmficance of, 210 
Regression, 130 
coefficients, 130, 176 
equations, 129-130, 172, 174, 180 
test of hneanty of, 268-275 
Rehabihty, 151-159, 290-294 
and attenuation, 159-160 
coefficient of, 151 
of difference scores, 154 
error of measurement, 153, 294 
form vs form, 157-158 
range, effect of, 159 
significance of, 290-294, 310 
split-half, 156-157 
test^retest, 156 


Eenshaw, M J , 299 
Replication, 304r-305 
Residuals, 138, 176, 267, 287, 339 
Rider, P R , 304n 

Safiir, M , 199n 
Sampling, 55 
distnbution, 53, 76 
binomial as, 53 
of chi square, 213-221 
empincal demonstration of, 73- 
75 

of F, 245 
of t, 105-106 

empincal demonstration, 73-75 
errors, reduction of, 8^90, 361- 
366 

for expenmental and control 
groups, 89-90, 363-366 
from fimte umverse, 99-100 
independence of umts, 99, 222 
size of sample required, 70, 90, 113- 
114 

from skewed universe, 100-101 
small samples, 58, 71, 104 
successive, 76 
techniques, 361-363 
area, 363 
quota, 363 

random, 55, 75, 361-362 
stratified, 362-363 
systematic, 362 
theory, 55, 75-78 
vanance, 76 

Scatter diagram, 116-118, 122-126 
Sheppard’s correction, 25 
Shrizikage of multiple r, 186, 279 
Sign test, 357 
Sigmficance, 51 
choice of level, 64-70 
of correlation, 145-147, 264-268 
of correlation ratio, 262-264, 272- 
275 

of differences: 
for changes, 90-94 
for correlations, 148 
for means 

correlated, 85, 108-109, 288- 
290, 317, 325 

independent, 87, 109-110, 252- 
262 

sub- vs total group, 100 



Index 


407 


Significance, of differences {Contmued) 
for proportions* 

correlated, 56-60, 228-230, 
232-233 

independent, 60-61 
for scores, 154 

for standard deviations, 88, 243- 
248 

for variances 
Bartlett^s test, 247-248 
correlated, 243-244 
independent, 244-248 
and erroneous conclusions, 65-70 
of interaction, 301, 306, 308, 309, 
324, 328-335 
levels, 51, 64r-70 
of multiple r, 185-186, 276-279 
of nonlineanty, 268-275 
of reliability, 290-294, 310 
Skewness, 13, 27-30 
of binomial distnbution, 45 
causes of, 30-31 
of sampling distnbutions 
of correlations, 146-147 
of proportions (orpercentages),54 
of standard deviations, 105 
Small sample treatment, 58, 71, 104 
of correlation, 146 
of difference 

for coi related means, 108-109 
for independent means, 109-110 
for vanances, 243-248 
of single mean, 107-108 
see also Analysis of vanance 
Smoothing, 8 
Snedecor, G W., 245 
Spearman-Brown formula, 15fi-157 
Split-half reliability, 156-157 
Spurious correlation, 163, 164 
Squares ^.nd square roots, 392-400 
Standard deviation, 21 
for combined groups, 26 
computation, 22-25 
sampling error of, 81, 243 
Sheppaid’s correction, 25 
Standard error, 53, 76 
of average deviation, 81 
of correlation measures* 
bisenal, 194 
multiple, 185 
product moment, 145 
tetrachoric, 200 
2 (transformed r), 147 


Standard error (Continued) 
of kurtosis, 82 
of mean, 77, 81 
from fimte umverse, 100 
for stratified sample, 363 
of mean difference, 79, 85 
of median, 81 
of proportion, 53-54 
from fimte universe, 100 
for stratified sample, 362 
of quartile deviation, 81 
of skewness, 82 
of standard deviation, 81 
of z (transformed r), 147 
Standard error of difference, 59, 83 
for changes, 92 
for means 

correlated, 85, 364r-365 
mdependent, 87 
sub- vs total group, 100 
for medians, 88 
for proportions; 
correlated, 56-60 
mdependent, 60-61 
for scores, 154 
for standard deviations, 88 
for g’s (transformed r’s), 148 
Standard error of estimate, 131-136, 
174-176 

Standard error of measurement, 150- 
154, 290-294 

Standard score, 34, 37-39 
and T score, 39 
Statistic, 2 

Stratified sampling, 362-363 
“Student,” 366 
Successive sampling, 76 
Sum of squares, 25, 107; see also under 
Analysis of variance 
Suppressant variable, 188-189 

t ratio, 105 

assumptions and hmitations in use 
of, 112-114 

and confidence limits, 107-108, 110 
for correlation, 146 
and critical ratio ((7B), 109 
degrees of freedom, 10^107, 111 
for difference: 

in correlated correlations, 148 
in correlated means, 108-109 
in correlated variances, 243-244 
in independent means, 109-110 



408 


Index 


ratio {pontinued) 
distribution of, 105-106 
and F, 260, 268, 289 
for rank correlation, 210 
for single mean, 107-108 
table of, 388 
T srore, 39 
Tabulation, 5-6 
Taubman, R E , 321n 
Test-retest rebabiLty, 156 
Tetrachonc correlation, 197-202 
computmg diagrams for, 199 
formula, 199 

sampling error of, 200-201 
Thorndike, R L , 355n 
Thurstone, L L , 199n 
Transformation 
mathematical, 191, 357 
standard scores, 37-38 
T scaling, 39 
True score, 151 
Two-tailed tests, 62-64, 112 

U test, 359-360 

Van Voorhis, W R , 112 
Vanance, 22 

additive nature of, 137-138 
and chi square, 243 


Vanance {Continued) 
computation, 22-25 
and correlation, 137-140, 176 
of difference, 137-138 
difference between, 243-248 
estimate of, 94 
homogeneity of, 248 
ratio, see F 
sampling of, 243 
of sum, 137-138 
theorem, 137-138 
See also Analysis of vanance 
Vanation, 13, 19 
average deviation, 21 
coefficient of, 161 
quartile deviation, 19-20 
standaid deviation, 21 

Walker, E L , 295, 301 
Wnght, Suzanne T , 256, 281 

Yates, F , 363n, 385-391 
Yates’s coriection for continuity, 230- 
231 

z, for difference between standard de- 
viations, 244 
z score, 34, 37-39 
z transfoimation for r, 147 
tables of, 384, 385 



